AGARD-CP-303 


\  ? 


O' 


ADVISORY 


IROUP  FOR  AEROSPACE  RESEARCH  &  DEVELOPMENT' 


7  RUE  ANCELLE  S220U  NE U ILLY  SUR  SEINE  'FRANCE 


AGARD  CONFERENCE  PROCEEDINGS  No.  303 

Tactical  Airborne  Distributed 
Computing  and  Networks 


0-N 


<b 


DISTRIBUTION  STATtMEtW  A~V  ^ 

Approved  ior  pubbc  release;  * 
Diatribuiion  Unlimited 


NORTH  ATLANTIC  TREATY  0‘RGANIZATlON 


DISTRIBUTION  AND  AVAILABILITY 
ON  BACK  COVER 


01  05  008 


AGARD-CP-303 


NORTH  ATLANTIC  TREATY  ORGANIZATION 
ADVISORY  GROUP  FOR  AEROSPACE  RESEARCH  AND  DEVELOPMENT 
(ORGANISATION  DU  TRAITE  DE  L’ATLANTIQUE  NORD) 


AGARD  Conference  Proceedings  No.303 
TACTICAL  AIRBORNE  DISTRIBUTED 
COMPUTING  AND  NETWORKS 


A v  iw:  ,.j  :Vi 


Copies  of  papers  and  discussions  presented  at  a  Meeting  of  the  Avionics  Panel 
held  in  R#ros,  Norway  22-25  Jure,  1981. 


THE  MISSION  OF  AGARD 


The  mission  of  AGARD  is  to  bring  together  the  leading  personalities  of  the  NATO  nations  in  the  fields  cf  science 
and  technology  relating  to  aerospace  for  the  following  purposes: 

-  Exchanging  of  scientific  and  technical  information; 

-  Continuously  stimulating  advances  in  the  aerospace  sciences  relevant  to  strengthening  the  common  defence 
posture; 

-  Improving  the  co-operation  among  member  nations  in  aerospace  research  and  development; 

-  Providing  scientific  and  technical  advice  and  assistance  to  the  North  Atlantic  Military  Committee  in  the  field 
of  aerospace  research  and  development; 

-  Rendering  scientific  and  technical  assistance,  as  requested,  to  other  NATO  bodies  and  to  member  nations  in 
connection  with  research  and  development  problems  in  the  aerospace  field; 

-  Providing  assistance  to  member  nations  for  the  purpose  a?  increasing  their  scientific  and  technical  potential; 

-  Recommending  effective  ways  for  the  member  nations  to  use  their  research  and  development  capabilities  for 
the  common  benefit  of  the  i’ATO  community. 

The  highest  authority  within  AGARD  is  the  National  Delegates  Board  consisting  of  officially  appointed  senior 
representatives  from  each  member  nation.  The  mission  of  AGARD  is  carried  out  through  the  Panels  which  are 
composed  of  experts  appointed  by  the  National  Delegates,  the  Consultant  and  Exchange  Programme  and  the  Aerospace 
Applications  Studies  Programme.  The  results  of  AGARD  work  are  reported  to  the  member  nations  and  the  NATO 
Authorities  through  the  AGARD  series  of  publications  of  which  this  is  one. 

Participation  in  AGARD  activities  is  by  invitation  only  and  is  normally  limited  to  citizens  of  the  NATO  nations. 


The  content  of  this  publication  has  been  reproduced 
directly  from  material  supplied  by  AGARD  or  the  authors. 


Published  October  1981 

Copyright  ©  AGARD  1981 
All  Rights  Reserved 

ISBN  92-835-0302-3 


$ 

Printed  by  Technical  siting  and  Reproduction  Ltd 
Harford  House,  7-9  Charlotte  St,  London,  WIP  l HD 


u 


THEME 


A  distributed  processing  system  has  been  characterized  as  having  a  multiplicity  of 
physically  distributed  resources  interacting  through  a  communication  network;  high-level 
operating  system  software  unifies,  controls,  and  integrates  the  components  and  provides 
transparency  to  services  rendered. 

The  distributed  system  architecture  offers  cooperative  autonomy  in  overall  operation 
to  achieve  efficient  use  of  avionic  resources  and  to  provide  high  system  integrity,  cost-effective 
maintenance,  expandability,  and  improved  performance.  The  physical  distribution  of  resources 
comprising  a  system  works  to  insure  immunity  to  battle  damage  and  accidents.  Also,  in  some 
instances  systems  for  distributed  computation  may  take  the  form  of  air-to-satcllite,  air-to-surface, 
or  air-to-air.  The  advent  of  small,  inexpensive,  low-power  computing  revolutionized  complex 
systems  design,  and  raises  serious  questions  regarding  the  future  of  centralized,  hardwired 
avionics  computer  systems. 

Distributed  processing,  having  been  made  possible  by  the  price  performance  revolution 
in  micro-electronics,  now  challenges  us  to  correctly  apply  the  concept  to  alleviate  cost,  schedule, 
reliability,  operational,  and  maintenance  problems  in  avionic  systems. 


■i 


PROGRAM  AND  MEETING  OFFICIALS 


Chairman:  Mr  B.L.Dove,  US 

Program  Committee:  Mr  W.F.Ball,  US 

Mr  T.J.Sueta,  US 
Mr  O.Rossignol,  FR 
Ir.  H.A.Timmers,  NE 
Dr  G.  Van  Keuk,  GE 
Mr  R.Vaughn,  US 
Mr  R.Wright,  UK 


LOCAL  COORDINATOR 

Dr  L.Hoivik,  NO 
NDRE,  Div.  for  Electronics 
Kjeller,  NO 


AVIONICS  PANEL 


Chairman:  Dr  M. Vogel 
DFVLR  e.v. 

8031  Oberpfaffenhofen 
Post  Wessling/obb 
FRG 


Deputy  Chairman:  Mr  Y.Brault 

Thomson  CSF 
Division  Equipements 
Avioniques  et  Spatiaux 
178  Bid  Gabriel  P6ri 
92240  Malakoff 
FR 


PANEL  EXECUTIVE 

Lt.Col.  J.B.Caiiller 
AGARD/NATO 
7,  rue  Ancelle 
92200  Neuilly-sur-Seine 
France 


CONTENTS 


Page 

THEME  ill 

0 

PROGRAM  AND  MEETINu  OFFICIALS  iv 

TECHNICAL  EVALUATION  REPORT 

by  B.L.Dove  viii 

Reference 


SESSION  I  -  STATE-OF-THE-ART  IN  DISTRIBUTED  PROCESSING  * 

DISTRIBUTED  DATA  PROCESSING  -  WHAT  IS  IT? 

By  P.H.Enslow,  Jr. 

Paper  2  cancelled 

THE  EFFECT  OF  INCREASINGLY  MORE  COMPLEX  AIRCRAFT  AND  AVIONICS  ON 


THE  METHOD  OF  SYSTEM  DESIGN 
llty  J.T.Martin 

A  T  UTORIAL  ON  DISTRIBUTED  PROCESSING  IN  AIRCRAFT/AVIONICS  APPLICATIONS 
by  B.A.Zempolich 


SUMMARY  AND  DISCUSSION 


3 

4 
SI 


SESSION  II  -  DISTRIBUTED  AIRBORNE  SYSTEM  ARCHITECTURE  ' 

Paper  5  cancelled 

PERFORMANCE  STUDY  OF  A  DISTRIBUTED  MICROPROCESSOR  ARCHITECTURE 
FOR  USE  ABOARD  MILITARY  AIRCRAFT 
by  K.G.Shin  and  C.M.Krishna 

THE  DEVELOPMnrsT  OF  ASYNCHRONOUS  MULTIPROCESSOR  CONCEPTS  FOR 
FLIGHT  CONTROL  SYSTEM  APPLICATIONS 
by  S. Vi. Wright  and  J.G.Brown 

FUNCTIONAL  VERSUS  COMMUI JICATION  STRUCTURES  IN  MODERN  AVIONIC  SYSTEMS 
by  K.Brammer  and  A.V>eimann 

CONTINUOUS  RECONFIGURATION  IN  A  MULTI-MICROPROCESSOR  FLIGHT  CONTROL 
SYSTILM 

by  S.L.Maher  and  SJ. Larimer 

EXPERIENCES  WITH  THE  EXPERIMENTAL  FFM-MCS 

by  H.V.Issendorff  - 

SUMMARY  AND  DISCUSSION 


SESSION  IU  -  DISTRIBUTED  SYSTEM  DESIGN  APPROACHES 

SAVANT  -  A  DATABASE  MANIPULATION  TECHNIQUE  FOR  SYSTEM  ARCHITECTURE 
DESIGN  VERIFICATION  ANALYSIS 

by  A.A.Callaway  1 1 


SIGN  AL  PROCESSING  WITH  SYSTOLIC  ARRAYS 

by  R.W.Priester,  K. Bromley,  J.tlary  and  H.Whitehouae 


12 


Reference 


ECONOMIC  CONSIDERATIONS  FOR  REAL-TIME  NAVAL  AIRCRAFT/AVIONIC 
DISTRIBUTED  COMPUTER  CONTROL  SYSTEMS 
by  B.A.Zempolich 

FUNCTIONAL  DOCUMENTATION  -  A  PRACTICAL  AID  TO  THE  ORDERLY  SOLUTION 
OF  THE  SYSTEM  DESIGN  PROBLEM 
by  J.T.Martin 

SUMMARY  AND  DISCUSSION 


SESSION  IV  -  DISTRIBUTED  SYSTEM  SOFTWARE 

A  CONSISTENT  APPROACH  TO  THE  DEVELOPMENT  OF  SYSTEM  REQUIREMENTS 

AND  SOFTWARE  DESIGN 
by  A.O.Ward 

A  PEARL  SOFTWARE  SYSTEM  FOR  MULTI-PROCESSOR  SYSTEMS 
by  P.Elzer  and  H.J. Schneider.  Presented  by  Mr  Schloch 

DISTRIBUTED  AND  DECENTRALIZED  CONTROL  IN  FULLY  DISTRIBUTED 

PROCESSING  SYSTEMS 

by  P.H.Enslow,  Ir.  Presented  by  Dr  Livesey 

RECOVERY  IN  DISTRIBUTED  PROCESSING  SYSTEMS  / 

by  L.Svobodova 

GENERALIZED  POLLING  ALGORITHMS  FOR  DISTRIBUTED  SYSTEMS 
by  J.K.Wolf 


SUMMARY  AND  DISCUSSION 


SESSION  V  -  FAULT  TOLERANCE  AND  RELIABILITY  IN  DESIGNS 

STAGE-STATE  RELIABILITY  ANALYSIS  TECHNIQUE 
by  A.D.Stem 

METHODOLOGY  FOR  MEASUREMENT  OF  FAULT  LATENCY  IN  A  DIGITAL  AVIONIC 
MINIPROCESSOR 

by  J.G.McGough.  F.Swem  and  S.J.Bavuso.  Presented  by  Mr  Moses 

HIERARCHICAL  SPECIFICATION  OF  THE  SIFT  FAULT  TOLERANT  FLIGHT  CONTROL 
SYSTEM 

by  P.M.Melliar-Smith  and  R.L. Schwartz 

RECONFIGURATION:  A  METHOD  TO  IMPROVE  SYSTEMS  RELIABILITY 
by  J.Szlachta 

RESEAU  D’ECHANGE  RECONFIGURABLE  POUR  CONTROLE  DE  PROCESSUS  REPARTI 
par  Ch.Meraud  et  B.Maurel 

SUMMARY  AND  DISCUSSION 


SESSION  VI  -  INTERCONNECTION  -  BUSSING  AND  NETWORKING 


Paper  25  cancelled 


PROTOCOL  LEVEL  MODULES  -  FOR  COST  EFFECTIVE  STANDARD  COMPUTER 
COMMUNICATION 

by  Q.Hvinden,  Y.Lundh  and  Q.Sandholt 


vi 


Reference 


LES  STRATEGIES  DE  RETRANSMISSION  POUR  LE  CONTROLE  DERREUR  DANS  LES 
PROTOCOLES  DE  TRANSFERT  DE  DONNEES 

par  G.JuanoIe  27 

PRACTICAL  ASPECTS  WHICH  APPLY  TO  MIL-STD-1553B  DATA  NETWORKS 

by  I.Moir  and  P.A.Duke  28 

THE  TRAFFIC  FLOW  IN  A  DISTRIBUTED  REALTIME  COMPUTING 
SYSTEM  (RDC-SYSTEM)  WITH  A  FIBER  OPTIC  RINGBUS  SYSTEM 

by  D.Hegcr  and  R  ,  Bah  re  29 

DISPERSED  SENSOR  PROCESSING  MESH  PROJECT 

by  V.A.Megna  30 

NEXT  GENERATION  MILITARY  AIRCRAFT  WILL  REQUIRE  HIERARCHICAL/MULTI- 
LEVEL  INFORMATION  TRANSFER  SYSTEMS 

by  J.W.McCuen  31 

SUMMARY  AND  DISCUSSION  S6 

SESSION  VII  -  APPLICATION  OF  DISTRIBUTED  SYSTEM  DESIGNS  TO  AVIONIC  SYSTEMS 

SIFT  -  AN  ULTRA-RELIABLE  AVIONIC  COMPUTING  SYSTEM 

by  K.Moses  32 

STATE-OF-THE-ART  COMPUTER  MONITORING  EQUIPMENT 

by  H.Nelson  33 

INTEGRATED  CONTROL  OF  MECHANICAL  SYSTEMS  FOR  FUTURE  COMBAT  AIRCRAFT 

by  G.W.Wilcock,  P. Lancaster  and  C.Moxey  34 

ARCHITECTURE  DU  SYSTEME  D’ARMES  DU  MIRAGE  2000 

par  S.Cruce-Spinelli,  B.VanJecasteele  et  J.F.Ferreri  35 

THE  COMPUTER  SYSTEM  OF  THE  TORNADO* 

by  P.A.Bross  36 

F/A-I8A  TACTICAL  AIRBORNE  COMPUTATIONAL  SUBSYSTEM 

by  T.V.McTigue  37 

F/A-l  8  WEAPONS  SYSTEM  SUPPORT  FACILITIES 

by  T.F.C’Neill  38 

SUMMARY  AND  DISCUSSION  .  S7 

UST  OF  ATTENDEES  A 


•Paper  RESTRICTED.  Copies  available  from  author,  see  List  of  Attow'jeea. 

vii 


TACTICAL  AIRBORNE  DISTRIBUTED 
COMPUTING  AND  NETWORKS 

TECHNICAL  EVALUATION  REPORT 

Billy  L.  Dove 

Technical  Program  Chairman 


EXECUTIVE  SUMMARY  - 

CONCLUSIONS 

o  Benefits  credited  to  distributed  data  processing  are  not  capable  of  being  realized  within  the 
current  state-of-the-art. 

o  Preparation  of  military  standards  fo»  airborne  distributed  data  processing  Is  Inappropriate  at 
this  time.  However,  a  mechanism  to  promote  uniformity  In  technical  definitions  between  workers 
In  this  area  would  be  useful. 

o  The  state-of-the-art  Is  not  adequate  to  support  the  design  and  validation  of  airborne  distributed 
data  processing  systems  for  critical  military  missions. 

o  The  economic  leverage  of  the  military  Is  no  longer  a  factor  with  microelectronic  manufacturers, 
therefore,  system  designers  must  consider  technology  independence  In  their  designs. 

o  Software  Is  of  considerable  Importance  tc  this  area. 


RECOMMENDATIONS 

o  AGARD  follow  up  on  this  subject  area  with  a  future  meeting. 

o  AGARD  support  specialist  meeting  on  Methodology  and  Design  Techniques  for  Distributed  Systems. 


GENERAL  - 

The  symposium  was  three  and  a  half  days  In  length  being  held  June  22  to  25,  1981,  in  Rnros,  Norway. 

Approximately  130  people  were  registered.  Attendance  at  all  sessions  was  unusually  high. 

Thirty-eight  papers  were  scheduled  for  presentation  as  of  June  22,  1981 ,  and  only  two  papers  were 
not  presented  at  the  meeting. 

Few  of  the  papers  were  Invited  ones.  Even  so,  the  material  gathered  for  the  program  proved  to  be 
of  Interest  overall. 

A  large  number  of  questions  were  asked  whose  answers  are  contained  in  the  proceedings. 

A  major  objective  of  this  meeting  was  to  seek  a  delineation  of  the  state-of-the-art  In  airborne 
distributed  computing.  Fortunately,  this  symposium  attracted  a  large  number  of  people  representing 
a  broad  range  of  Interests  and  included  academics,  institutes,  and  avionic  and  airframe  manufacturers. 
This  was  as  Intended  by  the  program  committee. 


TECHNICAL  SESSIONS  - 


The  potential  benefits  from  distributed  computing  system  concepts  such  as  Improved  reliability/ 
availability,  ease  of  system  growth,  shared  resources,  etc.,  offer  attractive  alternatives  to  today's 
problems,  however,  the  technical  capability  to  realize  these  benefits  has  been  brought  Into  question. 
Thus,  the  purpose  of  tha  meeting  was  establ 1  shed- to  assess  the  state-of-the  art  capability  In  air¬ 
borne  distributed  data  processing. 


The  meeting  was  organized  In  such  a  way  as  to  encourage  a  diverse  response  from  the  call  for  papers. 
Seven  sessions  were  defined,  as  follows: 


Session  I 
Session  II 
Session  ill 
Session  IV 
Session  V 
Session  VI 
Session  VII 


State-of-the-Art 
Architectures 
Design  Approaches 
Software 

Fault  Tolerance  and  Reliability 
Bussing  and  Networking 
Appl 'cations 


viii 


Session  I:  State-of-the-Art  In  Distributed  Processing.  This  was  a  tutorial  session.  The  first 
paper  was  Invited  and  given  extra  time.  It  focused  on  the  definition  of  distributed  computing 
systems.  The  matter  of  definition  Is  Important  as  it  relates  a  name  to  a  level  of  potential  benefits. 
This  proved  to  be  a  very  Interesting  and  much  needed  paper  as  judged  by  the  reaction  of  the  audience. 
A  continuing  ik.ed  for  the  refinement  of  technical  definitions  was  estab’lshed.  A  major  point  from 
this  session  was  that  the  state-of-the-art  Is  far  from  being  able  to  provide  the  benefits  claimed  by 
airborne  distributing  systems  enthusiasts. 

The  Introductory  paper  (1)  written  by  Dr.  Philip  Enslow,  USA,  "Distributed  Data  Processing— 

What  Is  It?"  was  presented  by  Dr.  John  Llvesey.  The  paper  focused  on  definitions  which  set 
the  scene  for  the  entire  symposium. 

Mr.  Martin's  paper  (3),  "The  Effect  of  Increasingly  More  Complex  Aircraft  and  Avionics  on 
the  Method  of  System  Design,"  presented  a  historical  treatment  of  aircraft  and  their  systems. 

His  point  being  that  little  change  was  required  in  the  design  methodology  for  systems  of  the 
past,  but  that  a  revolutionary  change  In  methods  Is  required  In  order  to  design  distributed 
systems. 

Mr.  Zempollch's  paper  (4),  "A  Tutorial  on  Distributed  Processing  In  Aircraft/Avionics  Applica¬ 
tions,"  dealt  with  systems  architectural  concepts  from  the  analog  to  digital  and  beyond  to  the 
hierarchical.  Emphasis  was  placed  upon  the  need  for  top-down  design  and  the  synergism  achleve- 
able  from  a  team  approach. 

Following  the  three  tutorial  papers  of  the  first  session,  the  authors  and  Dr.  Von  Issendorff 
participated  In  a  free-exchange  question  and  answer  period.  Arising  from  the  many  questions 
and  comments  during  this  period  was  the  subject  of  military  standards  and  academic  definitions. 

Session  II:  Distributed  Airborne  System  Architecture.  The  papers  of  this  session  revealed  a  tendency 
to  exploit  the  advances  made  In  microcomputer  technology  by  partitioning  both  hardware  and  software 
Into  functional  modules.  The  drivers  for  this  are:  a  possible  positive  Influence  on  reliability  and 
damage  tolerance;  use  of  Identical  hardware;  cost  of  software;  and  better  visibility  Into  the  system 
for  better  maintainability.  It  was  abundantly  clear  that  system  architects  are  at  work  putting  new 
technology  to  use  In  new  ways.  It  Is  also  clear  that  the  resulting  architectures  are  In  general 
following  the  same  trend.,  I.e. ,  distributed  microprocessors,  bus  connected,  and  with  some  form  of 
dynamic  redistribution  of  resources  or  functions.  It  seems  logical  and  can  be  so  argued  that  these 
architectures  offer  benefits  In  cost  and  reliability.  It  was  not  established  from  this  session  that 
a  body  of  data  exist  which  quantifies  design  factors  and  substantiates  the  claims  made  for  the  various 
category  of  architectures  presented. 

The  paper  by  Dr.  Shin  (6),  "Performance  Study  of  a  Distributed  Microprocessor  Architecture 
for  Use  Aboard  Military  Aircraft,"  proposed  a  concept  based  upon  the  decomposition  of  a 
mission  into  "atom  functions"  to  be  Implemented  by  microelectronic  technology.  A  central 
controller  eonmunlcates  with  the  "atom  functions,"  and  the  pilot  interfaces  with  the  central 
controller.  A  hypothetical  system  was  studied  and  some  performance  data  generated. 

Mr.  Wright's  paper  (7),  "The  Development  of  Asynchronous  Multiprocessor  Concepts  for  Flight 
Control  System  Applications,"  describes  a  concept  for  the  use  of  multiple  microprocessors, 
functionally  dedicated,  and  running  asynchronously.  The  concept  will  be  Implemented  and 
flown  on  a  Hunter  aircraft  as  a  fly-by-wire  system. 

Mr.  Brammer's  paper  (8),  "Functional  Versus  Communication  Structures  In  Modern  Avionic 
Systems,"  presented  results  from  Investigations  Into  the  amount  of  Interconnections  In 
several  avionic  system  concepts.  The  bus  concept  and  the  layered  star  appear  to  offer 
less  Interconnections  from  this  analysis. 

Lt.  Maher's  paper  (9),  "Continuous  Reconfiguration  in  a  Multi-Microprocessor  Flight  Control 
System,"  offered  another  concept  of  microcomputers  interconnected  with  busses  and  an  algorithm 
to  dynamically  redistribute  system  functions.  Three  advantages  of  this  architecture  were 
offered:  expandability,  reduction  of  software  cost,  and  reduction  of  unscheduled  maintenance. 

Dr.  Von  Issendorff's  paper  (10),  "Experiences  with  the  FFM-MCS,"  presented  the  aeslgn  of  a 
test-bed  for  research  on  distributed  data  processing.  Research  tasks  undertaken  Include 
decomposition  of  data  processing  tasks  Into  sets  of  functions,  message  construction,  and 
transfer  protocol. 

Session  III:  Distributed  System  Design  Approaches. 

Dr.  Callaway's  paper  (11),  "SAVANT  -  A  Database  Manipulation  Technique  for  ../stem  Architecture 
Design  Verification  and  Analysis,"  described  an  Interactive  tool  capable  of  representing  the 
various  facets  of  a  digital  system  design.  SAVANT  traces  errors.  Identifies  Inconsistencies 
In  designs,  and  provides  data  for  trade-off  between  different  configurations. 

Dr.  WMtehouse's  peper  (12),  "Signal  Processing  with  Systolic  Arrays,"  presented  a  specialized 
hardware  approach  for  fast  matrix  computation. 

Mr.  Zempollch's  paper  (13),  "Economic  Considerations  for  Real-Time  Naval  Aircraft/Avionic 
Distributed  Computer  Control  Systems,"  emphasized  the  economic  aspects  to  be  considered 
during  system  design.  Lack  of  economic  leverage  over  microelectronic  manufacturers  has 
resulted  In  questions  about  the  viability  of  standardization  and  comnonallty. 


Mr.  Martin's  paper  (1*),  "Functional  Documentation  -  A  Practical  Aid  to  the  Orderly 
Solution  of  the  System  Design  Problem,"  discussed  en  organized  approach  to  the 
decomposition  of  system  requirements  from  specification  to  functional  detail.  This 
technique  promotes  comnunl cation  between  persons  of  different  disciplines,  and  results 
In  a  well -documented  design. 

Session  IV:  Distributed  System  Software.  Considering  the  criticality  of  software  to  the  realization 
and  success  of  dlstrlbued  systems.  It  was  surprising  that  this  session  received  the  least  support  In 
papers.  The  Individual  papers  were  of  sufficient  quality;  however,  the  scope  and  number  were  Inade¬ 
quate.  A  final  conclusion  cannot  be  drawn  regarding  the  state-of-the-art  of  software  for  distributed 
systems. 

Mr.  Ward's  paper  (IS),  "A  Consistent  Approach  to  the  Development  of  System  Requirements 
and  Software  Design,"  reported  on  the  SAFRA  (Semi-Automatic  Functional  ftequl  rements 
Analysis)  project,  Many  of  the  Ingredients  of  a  careful  analysis  of  requirements  and 
their  mechanization  were  discussed.  No  comparison  of  SAFRA  to  other  approaches  was 
mentioned.  A  limited  experience  base  exists  with  SAFRA. 

Dr.  Llvesey's  paper  (17),  "Distributed  and  Decentralized  Control  In  Fully  Distributed 
Systems,"  reinforced  the  definition  of  fully  distributed  systems  through  the  discussion 
of  decentralized  conti ol.  An  Important  aspect  of  the  paper  was  the  task  graph  concept. 
Information  stored  In  task  graphs  could  be  useful  to  Implementing  the  dynamics  of 
reconf 'guration. 

Dr.  Svobodova's  paper  (18) ,  "Recovery  In  Distributed  Processing,"  discussed  techniques 
for  analyzing  task  handling,  confinement  of  failure  effects,  preservation  of  status, 
and  recovery  in  digital  systems. 

Dr.  Wolf's  paper  (19),  "Generalized  Polling  Algorithms  for  Distributed  Systems,"  compared 
two  polling  algorithms  with  the  objective  uf  eliminating  the  Inefficiency  of  round-robin 
polling.  TMs  theoretical  paper  offered  no  example  to  Illustrate  the  amount  of  Improvement 
made. 

Session  V:  Fault  Tolerance  and  Reliability  In  Designs.  Three  papers  In  this  session  demonstrated 
great  breadth  In  the  consideration  being  given  reliability  assessment  and  validation.  Emphasis  on 
ultrarel lability  and  the  validation  problems  for  such  systems  Is  being  worked  In  the  civil  R&D  sector. 

Mr.  Steam's  paper  (20),  "State-State  Reliability  Analysis  Technique,"  presented  an 
Improved  method  for  reliability  analysis  for  redundant  systems.  The  state-state  technique 
Is  less  difficult  to  use  and  simpler,  thus  not  as  many  errors  will  be  caused  by  having  such 
a  large  number  of  combinations  In  the  analysis.  Sources  of  unreliability  become  readily 
apparent  using  this  technique. 

Mr.  Moses'  paper  (21),  "Methodology  for  Measurement  of  Fault  Latency  In  a  Digital  Avionic 
Multiprocessor,"  discussed  the  use  of  an  emulator  to  conduct  fault  injection  experiments. 

The  results  from  this  work  brings  Into  question  the  fault/fall'-rc  detection  capability  of 
self-test  programs  in  avionic  systems,  and  the  accuracy  of  reliability  analysis  program 
results. 

D*-.  Schwartz's  paper  (22),  "Hierarchical  Specification  of  S'FT  Fault  Tolerant  Flight 
Control  System,"  offered  for  consideration  a  formal  mathematical  proof  of  ultrareliable 
computer  functional  and  reliability  requirements.  This  approach  establishing  a  mathe¬ 
matically  provable  relationship  between  the  specification  and  the  programs  of  the  actual 
systems  is  an  Intriguing  and  novel  approach, 

Mr.  Szlachta's  paper  (23),  "Reconfiguration:  A  Method  to  Improve  Systems  Reliability," 
discussed  the  improvement  of  reliability  by  use  of  hardware  reconfiguration. 

Mr,  Meraud's  paper  (24),  "Reseau  d'Echange  Reconfigurable  pour  Controle  de  processus 
Reparti,"  discussed  a  means  for  dynamic  distributed  (decentralized)  control  of  reconfigu¬ 
ration.  This  technique  is  directed  to  systems  of  high  reliability  although  no  reliability 
analysis  results  were  given. 

Session  VI:  Interconnection  -  Bussing  and  Networking. 

Mr,  Hvlnden's  paper  (26),  "Protocol  Level  Modules—  For  Cost-Effective  Standard  Computer 
Communications,"  describes  a  "host  Independent"  implementation  of  computer  comnunl cation 
protocols.  This  Is  realized  by  the  development  of  harchvare  and  software  modules.  Work¬ 
load  on  the  host  is  reduced. 

Mr.  Juanole's  paper  (27),  "Les  Strategies  de  Retransmission  pour  le  Contrdle  d'Erreur  dans 
les  Protocoles  de  Transfert  de  Donnies,"  presented  the  functions  of  a  protocol  for  data 
transfer.  Error  control  Is  Included  In  an  error  detection  and  data  retransmission  scheme. 

This  strategy  was  analyzed  and  the  logic  of  the  process  explained. 

Mr.  Duke's  paper  (28),  "Practical  Aspects  Which  Apply  to  MIL-STD-1553B  Dnta  Networks," 
discussed  the  ramifications  of  trying  to  satisfy  two  different  standards--one  relating 
to  data  transmission  and  the  other  to  standard  Interfacing  electronics.  This  situation 
Is  created  when  bus  redundancy,  multibus  architecture,  and  some  Intelligence  Is  required 
In  a  stores  management  and  vreapons  aiming  system. 


x 


Mr.  Heger's  paper  (29),  "The  Traffic  Flow  Measured  In  a  Distributed  Real  Time  Computing 
System  (RDC)  With  a  Fiber  Optic  Ring  Bus  System,"  presented  results  from  analysis  and 
measurements  taken  from  a  fault  tolerant  microprocessor  system's  fiber  optic  ring  bus. 

This  Is  a  very  good  example  from  which  to  compare  the  results  of  an  analytical  method 
vs.  practical  data  gathering.  Many  more  examples  will  be  required  to  validate  either 
method. 

Mr.  Megna's  paper  (30),  "Dispersed  Sensor  Processing  Mesh  Project,"  presented  the 
mechanization  of  a  limited  port  netwo,  communication  structure.  Although  detailed 
In  implementation,  neither  the  general  analytical  methods  nor  the  general  study  results 
were  presented  which  compared  performance  to  the  existing  F-8. 

Mr.  McCuen's  paper  (31),  "Next  General  Military  Aircralt  Will  Require  Hierarchlcal/Multl- 
Level  Information  Transfer  Systems,"  was  concerned  with  a  discussion  of  a  future  high¬ 
speed  data  bus  standard  and  architectural  approaches  to  It.  Information  was  requested 
to  assist  the  task  group  in  the  formulation  of  a  high  order  transfer-type  system. 

Session  VII:  Application  "1  Distributed  System  Designs  to  Avionics  Systems. 

Mr.  Moses'  paper  (32),  "SIFT  -  An  Ultra-Reliable  Avionic  Computing  System,"  noted  that 
’n  recent  years  automatic  flight  control  systems  (FCS)  In  aircraft,  which  previously 
provided  mainly  stability  augmentation  and  other  pilot-relief  functions,  have  more 
recently  taken  on  flight-critical  tasks.  These  flight-critical  tasks  are  those  whose 
successful  accomplishment  Is  vital  to  the  safety  of  the  aircraft  (a.g.,  automatic 
landing,  fly-by-wire  control  system,  control -figured  vehicle  methods).  An  FCS  which 
takes  on  these  critical  safety-related  tasks  must  be  ultrareliable.  SIFT  Implements 
software-implemented  fault-tolerance  techniques  utilizing  hardware  redundancy.  Achieve¬ 
ment  of  failure  probabilities  of  1010  per  hour  were  quoted.  Using  multiprocessor  "star 
connection"  techniques,  the  computing  is  carried  out  by  high-speed  10-bit  Bendlx  9S0 
processors  (each  with  a  throughput  of  800  KOPS  with  an  appropriate  FCS  Instruction  mix 
and  with  32K  memory).  Software  algorithms  are  used  for  failure  detection.  After  fault 
detection  and  Isolation,  the  software  provides  reconfiguration  to  accommodate  the  fault. 

The  paper  described  the  SIFT  architecture  and  its  hardware  Implementation.  As  an 
efficient  approach  to  the  design  of  ultrareliable  avionics,  the  author  noted  that  it 
should  pave  the  way  for  acceptance  of  fly-by-wire  and  other  advanced  FCS  techniques. 

Mr.  Nelson's  paper  (33),  "State-of-the-Art  Computer  Monitoring  Equipment,"  described  the 
results  of  a  significant  software  support  effort  at  the  NAVWPNCEN  that  has  resulted  In 
the  availability  and  practical  use  of  an  airborne-computer  hardware  monitor.  This  device, 
called  SOVAC  (Software  Validation  And  Control)  provides  a  high  capacity,  real-time  and 
user-selective  "window"  that  gives  higF  visibility  Into  the  Internal  operation  of  the 
tactical  computer.  SOVAC  Is  a  computer  monitor  that  can  conceptlonally  be  thought  of  In 
terms  of  Its  basic  components.  These  are:  (1)  Tactical  Computer  Interface.  This  section 
provides  real-time  control  of  the  tactical  computer  and  provides  the  capability  to  capture 
Information  available  on  the  tactical  computer's  bus  and  control  lines.  (2)  SOVAC 
Controller.  This  high-speed,  microprogranmed  controller  coordinates  the  operation  of  the 
various  subsystems.  It  has  the  capability  to  recognize  various  types  of  events  or  complex 
combinations  of  events  and  set  a  breakpoint.  It  has  a  very  flexible  data  selection  and 
logging  capability.  The  functions  of  the  controller  are  under  the  control  of  the  user. 

(3)  User  Interface.  This  part  of  the  SOVAC  is  the  part  that  the  operator  actually  uses. 

Its  primary  components  are:  a  minicomputer,  a  terminal,  an  interface  to  the  SOVAC  con¬ 
troller  and  the  SOVAC  software.  The  paper  noted  that  SOVAC  is  a  powerful  tool  for  use 
by  anyone  who  has  a  need  to  know  what  is  happening  inside  a  tactical  computer. 

Mr.  Wilcock's  paper  (34),  "Centralized  Management  of  Mechanical  Systems  for  Future  Combat 
Aircraft,"  described  a  computer  oriented  approach  to  the  management  of  aircraft  mechanical 
systems  (fuel  management,  engine  control, etc. ) .  The  approach  described  was  a  micro¬ 
processor-based  management  system  distributed  throughout  the  airframe.  It  Is  planned 
that  these  Systems  Management  Processors  will  operate  independently  as  separate  computing 
centers  and  will  be  interconnected  via  a  data  bus  (MIL-STD-1553B  or  a  derivative).  Some 
of  these  microprocessors  will  act  as  remote  terminals  forwarding  raw  data  via  the  bus 
to  designated  processing  points.  The  paper  described  the  various  mechanical  systems  to 
be  controlled,  detailed  the  system  architecture,  described  the  mechanical  system  Interface 
with  the  microprocessor  and  speculated  on  the  cockpit  displays  and  pilot  interface.  The 
system  approach  was  seen  to  not  only  utilize  current  technology,  but  can  take  advantage 
of  future  technolog  . nd  can  be  adapted  at  a  reasonable  cost  and  schedule  to  meet  changing 
system  requirements. 

Mr.  Vandecastelle's  paper  (35),  "Architecture  Du  System  D'armes  Mirage  2000,"  stated  that 
the  architecture  of  the  armament  system  of  the  Mirage  2000  represents  an  advanced  generation 
of  digital  systems.  It  was  described  from  the  points  of  view  of  digital  equipment,  assign¬ 
ment  of  software  to  the  equipment,  digital  links,  and  monitoring  the  system  In  flight.  The 
paper  discussed  architectural  principles  that  embraced  hardware,  software,  the  distribution 
of  tasks,  and  corresponding  Interfaces.  It  is  flexible  enough  to  allow  for  the  development 
of  a  family  of  systems  of  different  sizes  and  different  operational  needs.  It  was  noted 
that  the  architecture  can  be  grossly  characterized  by  the  use  of  digital  multiplex  links 
of  the  "Digibus"  type,  a  standard  for  French  military  aeronautics  since  1974.  In  outline, 
the  paper  gave  a  general  view  of  the  Mirage  2000  system,  Including  (l)  a  discussion  of  the 
principal  sensors  (navigation,  radar,  EO,  active  and  passive  countermeasures);  (2)  display 
and  controls,  all  linked  together  by  the  standard  Digibus  technique;  (3)  the  philosophy  for 
the  distribution  of  the  computive  tasks;  (4)  discussion  of  Integrated  and  centralized  functions; 


(5)  architectures  for  the  central  computers  and  the  Dlgibus;  and  (6)  the  development 
methodology  for  the  software. 

Mr.  Bross'  paper  (36),  "Computer  System  of  the  Tornado,"  described  the  Tornado, 
while  not  representative  of  a  modern  distributed  computer  system,  nevertheless, 
as  a  system  with  physically  and  functionally  distributed  computing  power  operating 
through  a  dense  digital  network.  The  end  result  is  therefore  a  highly  integrated 
system.  The  paper  described  the  computing  system  and  Its  architecture,  viewing 
especially  the  functio  -1  autonomy  of  various  subsystems.  The  provision  of  system 
integrity  and  fault  lerance  Incorporating  redundancy  and  monitoring  capability  was 
highlighted.  The  Tornado's  computer  system  was  seen  to  be  of  a  hierarchical  distributed 
system  type  instead  of  an  equally  distributed  mechanization.  The  (inner)  lower  part, 
serves  for  aircraft  stability  purposes,  comprises  the  least  intelligence  but  is  highly 
redundant;  the  middle  part  provides  for  basic  autopilot  functions,  is  less  redundant; 
the  upper  part  serves  for  mission  functions/modes  and  Is  simplex  only.  However,  these 
highest  mission  functions  are  divided  into  two  master  functions,  one  ?.s  master  for  the 
horizontal  (steering)  and  one  for  the  vertical  plane  (terrain  following).  The  paper 
concluded  with  "lessons-learned"  and  remarked  on  improvements  to  be  considered  for  a 
next  generation  avionic  system. 

Mr.  McTigue's  paper  (37),  "F/A-18  Tactical  Airborne  Computer  and  Subsystem,"  presented 
a  description  of  the  Tactical  Airborne  Computational  Subsystem  used  in  the  U.S.  Navy/ 
McDonnell  Douglas  F/A-18A  Hornet  Fighter/Attack  Weapons  System.  The  F/A-18A  Hornet 
tactical  computer  subsystem  consists  of  two  central  mission  computers  and  a  number 
of  distributed  processors  embedded  in  various  sensors  and  display  subsystems.  This 
distributed  processing  system  is  interconnected  by  and  communicates  over  a  MIL-STD- 
1553A  serial  1-MHz  cornnand/re.ponse  multiplex  network.  The  distributed  processing 
system  architecture  was  discussed  and  the  rationale  was  presented  for  the  partitioning 
of  the  computational  tasks  between  the  central  mission  computers  and  the  distributed 
processors  embedded  in  the  sensor  subsystems.  The  salient  features  of  the  central 
mission  computer  and  the  distributed  processors  were  discussed  along  with  a  description 
of  the  functional  operation  of  the  interconnecting  MIL-STD-1553A  multiplex  comnunl cations 
system.  Finally,  the  development  process  for  the  Operational  Flight  Program  (OFP)  for 
the  central  mission  computers  was  described,  including  a  discussion  of  the  support 
facilities,  which  were  used  for  the  software  integration  and  validation. 

Mr.  O'Neill's  paper  (38),  "F/A-18  Weapon  System  Support  Facilities,"  described  the 
support  facility  tools  being  developed  by  the  Navy.  The  U.S.  Navy  is  currently 
acceptance-testing  the  McDonnell  Douglas  F/A-18  aircraft,  which  is  an  all-weather, 
fighter/attack  aircraft  with  more  than  30  on-board  computers  containing  more  than 
700K  words  of  programs.  Since  the  F/A-18  is  so  much  more  complex  than  any  aircraft 
currently  deployed,  sophisticated  tools  will  be  required  by  the  system  engineers  to 
support  the  avionics.  According  to  his  paper,  the  F/A-18  Weapons  System  Support 
Facility  (WSSF)  at  the  NAVWPNCEN ,  China  Lake,  CA  will  contain  all  of  the  support  tools 
(both  hardware  and  software)  necessary  to  test,  modify,  generate,  and  validate  all  of 
the  avionics  software,  hardware,  and  firmware.  The  WSSF  uses  several  minicomputers 
tied  together  In  a  distributed  network  to  provide  a  realistic  simulation  of  the  air¬ 
craft  flight  characteristics.  Using  this  approach,  the  avionics  computers  can  be 
integrated  into  the  simulation  and  tested  in  the  WSSF  before  flight  testing  starts. 

The  WSSF  appears  to  be  well  under  way  in  development  and  should  ease  the  Navy's  task 
of  supporting  the  F/A-18  aircraft. 


xii 


1-1 


DISTRIBUTED  DATA  PROCESSING  —  WHAT  IS  IT? 

Philip  H.  Enslow  Jr. 

Georgia  Institute  of  Technology 
School  of  Information  and  Computer  Science 
Atlanta,  Georgia  30332 

summary 

Distributed  processing  has  been  presented  as  the  means  to  obtain  improvements  in  a  number  of  areas  of  system  performance.  Utilizing 
a  list  of  these  desired  improvements  as  the  motivational  factors,  this  paper  presents  the  key  design  characteristics  of  systems  that  will 
deliver  a  major  proportion  of  these  improvements.  Because  of  the  wide  use  of  the  term  "distributed  processing,”  the  systems  described 
here  are  identified  as  “fully  distributed.” 

1  BACKQK03BD 

1.1  Qoala  of  P-oMmil-^r  System  Development, 

Although  the  state  of  the  art  In  digital  computers  has  certainly  been  advancing  faster  than  any 
other  technological  area  in  history,  it  is  somewhat  remarkable  that  the  goals  motivating  most  computer 
system  development  projects  have  remained  basically  unchanged  since  the  earliest  days.  Perhaps  the  most 
important  of  these  long  sought-after  improvements  are  the  following: 

1 .  Increased  system  productivity 

-  Greater  capacity 

-  Shorter  response  time 

aed  throughput 

blllty  and  availability 
expansion  and  enhancement 
h  and  degradation 
ty  to  share  system  resources 

alues*  for  these  various  goals  cannot  be  expressed  in  absolute  numbers,  so  it  is 
continue  to  apply  even  though  phenomenal  advances  have  been  made  in  many  or  them 
y,  and  reliability.  Whai  is  perhaps  more  noteworthy  and  Important  to  the  disous- 
aion  being  presented  herdjis  how  little  progress  has  been  made  in  areas  such  as  easy  modular  growth, 
availability,  adaptability^  etc. 

It  seems  that  each  neX major  systems  concept  or  development  (e.g. ,  multiprogramming,  multiproces¬ 
sing,  networking,  etc. )  has  been  presented  as  "the  answer*  to  achieving  all  of  the  goals  listed  above 
plus  many  others.  "Dlstrlbute'd  processing"  is  no  exception  to  this  rule.  In  fact,  many  salesmen  have 
dusted  off  their  old  lists  of  benefits  and  are  marketing  today's  distributed  systems  as  the  means  to 
aohleve  all  of  then.  Table  1  lists  some  of  the  benefits  currently  being  claimed  for  distributed  proces¬ 
sing  systems  in  current  sales  literature.  Although  some  forms  of  distributed  processing  appear  to  offer 
great  promise  as  *  Possible  means  to  make  significant  advances  in  many  of  the  areas  listed,  the  state-of- 
the-art,  particularly  in  system  control  software,  is  far  from  being  able  to  deliver  even  a  significant 
proportion  cf  these  benefits  today. 

1.2  ABPreaflhflJ  1ft  Improving  System  Performance 

Efforts  to  improve  the  performance  of  digital  computer  systems  can  address  or  be  focused  on  a  num¬ 
ber  of  major  levels  or  design  issues  within  the  overall  computer  structure.  These  levels  are: 

1.  Materials  -  the  basic  materials  used  in  the  construction  of  operating  devloea  such  as 
transistors,  integrated  circuits,  or  other  switching  devices. 

2.  Devloea  -  operating  devioes  such  as  transistors,  integrated  circuits,  junctions,  etc. 

3.  Switching  oircuits  -  design  of  ciroults  that  provide  fast  and  reliable  logic  operations. 

4.  Register-transfer  -  assemblies  such  as  registers,  buses,  shift  registers,  adders,  etc. 

5.  System  architecture  -  algorithms  for  executing  the  basic  functions  such  as  arithmetic  and 

logic  operations,  Interrupt  mechanisms,  control  of  processor  and  memory  states,  etc. 

6.  System  organisation  -  the  interconnection  of  major  functional  units  auoh  as  control, 
memory,  I/O,  arithaetlc/loglc  units,  etc.,  and  the  rules  governing  the  flow  of  data  and 
oontrol  signals  between  these  units.  This  level  also  considers  the  implementation  of  mul¬ 
tiple,  parallel  paths  for  simultaneous  operations  and  transfers. 

7.  Network  organisation  -  the  number,  characteristics,  and  topology  of  the  interconnection  of 
"oomplete"  systems  and  the  rules  governing  the  oontrol  and  utilisation  of  the  resources 
those  systems  provide. 

8.  System  software  -  oontrol  and  support  software  for  the  effective  management  and  utilisa¬ 
tion  of  the  hardware  capabilities  provided. 


1-2 


From  the  very  beginning  of  the  computer  era  there  has  been  activity  at  all  of  tbeae  levels  and  suoh  work 
continues  today,  (To  plaoe  it  Into  proper  perspective,  it  should  be  noted  that  the  research  work  carried 
on  under  this  projeot  is  focused  prlaarily  at  the  three  highest  levels,  systea  organisation,  network 
organization,  and  systea  software,  with  soae  work  at  level  5,  syatoa  architecture.) 

1.3  Parallol  Preaaaalna 

An  iaportant  theae  of  ooaputer  systea  developaent  work  at  levels  5-8,  'systea  architecture," 
"systea  organization,"  "network  organization,"  and  "syatea  software,"  has  been  nai^i i«i  amaaaaina. 
Parallel  processing  has  been  iapleaented  utilizing  approaches  focused  prlaarily  on  the  systea  hardware  or 
the  software  as  well  as  integrated  systeae  design. 

Since  the  early  days  of  coaputlng,  a  direction  of  research  that  has  orferwd  high  promise  and 
attracted  such  attention  is  "parallel  coaputlng."  Work  in  this  area  dates  fro*  the  late  1950’s  which  saw 
the  developaent  of  the  PILOT  systea  [LelnSS]  at  the  Rational  Bureau  of  Standards.  The  PILOT  systea 
consisted  of  "three  independently  operating  coaputers  that  could  work  in  cooperation  "[EnalTA],  (Proa 
the  inforaation  available,  it  appears  that  PILOT  would  be  classified  as  a  "loosely-coupled  systea* 
today.)  It  is  Interesting  to  note  that  the  evolution  of  parallel  "hardware*  systems  lead  prlaarily  to 
the  developaent  of  tiehtl v-ooupled  systems  suoh  as  the  Burroughs  B-825  and  B-5000,  the  earliest  examples 
of  the  classical  aultiprocessor.  Other  developaent  paths  saw  the  introduction  of  specialized  hardware 
systeas  such  as  SOLOMON  and  the  ILLIAC  IV,  exaaples  of  other  forms  of  tightly-coupled  processors. 

1.3.1  Systea  Coupling 

Systea  coupling  refers  to  the  aeans  by  which  twe  or  aore  coaputer  systeas  exchange  inforaation.  It 
refers  to  ooth  the  physical  transfer  of  such  data  as  well  ca  the  manner  in  which  the  recipient  of  the 
data  responds  to  its  contents.  These  two  aspects  of  systea  interconnection  are  called  "physical 
coup? lng"  and  "logical  coupling,"  and  they  are  present  in  all  multiple  component  systems  whether  the  com¬ 
ponents  of  interest  are  complete  coaputers  or  soae  saaller  asseably. 

The  terms,  "tight*  and  "loose*  have  been  utilized  to  describe  the  mode  of  operation  of  each  type  of 
coupling.  (Soae  authors  have  utilized  a  third  category  "medium  coupling"  and  related  it  to  a  range  of 
data  transfer  speeds;  however,  history  has  clearly  shown  that  basing  any  characterizations  of  digital 
computers  on  speed,  size,  or  even  cost  is  an  incorrect  approach.)  The  interconnection  and  interaction  of 
two  coaputer  systeas  can  then  be  described  by  specifying  the  nature  of  its  physloal  coupling  and  the 
nature  of  its  logical  coupling.  It  is  iaportant  to  point  out  that  all  four  combinations  of  these  charac¬ 
teristics  are  possible  and  that  they  all  have  been  observed  in  iapleaented  systeas. 

1.3. 1.1  Tight lv-Counled  Computer  Systems 

During  the  1960's  and  1970'a,  activities  in  the  developaent  of  parallel  ooaputing,  specifically 
multiple  coaputer  systems,  were  focused  prlaarily  on  the  developaent  of  tightly-coupled  systeas.  These 
tightly-coupled  systeas  took  the  fora  of  classical  aulti processors  (i.e.,  shared  main  memory)  as  wall  as 
specialized  computation  systems  suoh  as  vector  and  array  processors.  This  tight  physical  ooupllng  resul¬ 
ted  in  a  sharing  of  the  directly  executable  address  space  ccmaon  to  both  processors.  There  was  no  aeans 
by  which  the  recipient  of  the  data  or  Inforaation  being  transferred  cruld  -efuse  to  physloally  accept  it 
-  it  was  already  there  Jji  hla  address  space - 

These  early  systems  also  usually  iapleaented  tight  logical  coupling.  In  this  form  of  system 
interaction,  the  recipient  of  a  aessage  is  required  to  perform  whatever  service  is  specified  therein. 
With  tight  logical  coupling,  there  is  no  independence  of  decision  allowed  regarding  the  performance  or 
the  service  or  activity  "requested."  The  relationship  between  the  sender  and  recipient  is  basloally  that 
of  saater-slave. 

Although  the  ooncept  of  tightly-coupled  aultiprocessor  systeas  appears  to  be  a  viable  approach  for 
achieving  almost  unlimited  improvements  in  performance  (i.e.,  Increases  in  system  throughput)  with  the 
addition  of  more  processors,  such  has  not  been  the  results  obtained  with  iapleaented  systeas.  It  is  the 
very  nature  of  tight-coupling  that  results  in  limitations  on  the  laproveaents  achievable.  Soae  of  the 
ways  that  these  limitations  have  manifested  themselves  are  listed  below. 

1.  The  direct  sharing  of  resources  (aeaory  and  Input/output  primarily)  often  results  in 
access  conflicts  and  delays  in  obtaining  use  of  the  shared  resource. 

2.  Urer  programing  languages  that  support  the  effective  utilization  of  tlghtly-ooupled 
systeas  have  not  been  adequately  developed.  The  programmer  must  still  be  directly 
Involved  in  Job  and  task  partitioning  and  the  assignment  of  resources. 

3.  The  developaent  of  "optimal"  schedules  for  the  utilisation  of  the  processors  is  very 
difficult  except  in  trivial  or  static  situations.  Also,  the  inability  to  aelntain  perfeot 
synchronization  between  all  processors  often  Invalidates  an  "optimal"  schedule  soon  after 
it  has  been  prepared. 

A.  Any  Inefficiencies  present  in  the  operating  systea  appear  to  be  greetly  exaggerated  in  a 
tightly-coupled  systea. 


There  was  also  significant  activity  during  these  earlier  periods  in  the  development 
of  nultlple  coaputer  systeas  characterized  as  "attached  support  processors  (ASP).*  These 
systeas  were  physically  loosely-coupled;  but,  logically,  they  were  tightly-coupled.  The 
earliest  exaaples  of  this  type  of  systea  organization  were  the  use  of  attached  processors 
dedicated  to  Input/output  operations  in  large-scale  batoh  processing  systaas.  In  the  lat¬ 
ter  part  or  the  1970 's,  specialised  vector  and  array  proceasors  as  well  as  other  speolsl- 
purpose  units  such  as  fast  Fourier  transform  units  were  being  connected  to  general  oo»- 
putatlonal  systeas  and  utilised  as  attached  support  processors.  In  any  event,  the 
specialized  nature  of  the  services  provided  by  the  attached  processor  excludes  then  from 


1-3 


consideration  as  possible  approaches  to  providing  general-purpose  computational  support 
such  as  tbit  available  from  tightly-coupled  general-purpose  processors  functioning  as  mul¬ 
tiprocessors. 

Tightly-coupled  systems  certainly  do  have  a  role  to  play  in  the  total  spectrum  of 
computer  systems  organization;  however,  their  limitations  should  certainly  be  considered. 
It  was  the  recognition  of  these  limitations  and  the  small  amount  of  progress  made  in  over¬ 
coming  them  despite  the  expenditure  of  very  large  research  efforts  that  contributed  to  the 
decision  to  focus  our  current  research  program  on  loosely-coupled  systems. 


1. 3*1.2  Lo  /selv-Coupled  Systems 

Lo' aely-coupled  systems  are  multiple  computer  systems  in  which  the  individual  processors  both  com¬ 
municate  physically  and  interact  logically  with  one  another  at  the  "Input/output  level.*  There  is  no 
direct,  sharing  of  primary  memory,  although,  there  may  be  sharing  of  an  on-line  storage  device  such  a.',  a 
disk  in  the  interconnecting  input/output  communication  path.  The  important  characteristic  of  this  t/pe 
of  system  organization  and  operation  is  that  all  data  transfer  operations  between  the  two  component 
systems  are  performed  as  input/output  operations.  Tha  unit  of  data  transferred  is  whatever  is  permis¬ 
sible  on  the  particular  input/output  path  being  utilized;  and,  in  order  to  complete  a  transfer,  the 
active  cooperation  cf  both  processors  is  required  (i.e.,  one  might  execute  a  READ  operation  in  order  to 
accommodate  or  accept  another's  WRITE). 


Probably  the  moat  important  characteristic  of  loose  logical  coupling  is  that  one  processor  does  not 
have  the  capability  or  authority  to  "force"  another  processor  to  do  something.  One  prooessor  can 
"deliver"  data  to  another;  however,  even  if  that  data  is  a  request  (or  a  "demand”)  for  a  service  to  be 
performed,  the  receiving  processor,  theoretically,  has  the  full  and  autonomous  rights  to  refuse  to 
execute  that  request.  The  reaction  of  processors  to  such  requests  for  service  is  established  by  the 
operating  system  rules  of  the  receiving  processor,  not  by  the  transmitter.  This  allows  the  reoipient  of 
a  request  to  take  into  consideration  "local*  conditions  in  making  the  decision  as  to  wbat  actions  to 
take.  It  is  important  to  note  that  it  is  possible  for  a  system  to  be  physloally  loosely-coupled  but 
logically  tightly-coupled  due  to  the  rules  embodied  in  the  component  operating  systems,  e.g. ,  a  permanent 
ouster/ slave  relationship  is  defined.  The  other  reverse  condition,  tight  physloal  and  loose  logical 
coupling,  is  also  possible. 


1.3.2  Computer  Networks 

A  computer  network  can  be  characterized  as  a  physically  loosely-coupled,  multiple-computer  system 
in  which  the  interconnection  paths  have  been  extended  by  the  inclusion  of  data  conunlcations  links. 
Fundamentally  there  ire  no  differences  between  the  basic  characteristics  of  computer  network  systems  and 
other  loosely-coupled  systema  other  than  the  data  transfer  rates  normally  provided.  The  transfer  of  data 
betwaen  two  nodes  in  the  network  still  requires  the  active  cooperation  of  both  parties  involved,  but 
there  is  no  inherently  required  cooperation  between  the  operation  of  the  processors  other  than  that  whloh 
they  wlah  to  provide. 


1.3.3  Distributed  Sy a terns 

Although  there  is  a  large  amount  of  confusion,  and  often  controversy,  over  exactly  what  is  s 
■distributed  system,”  it  is  generally  accepted  that  a  distributed  system  is  a  multiple  oomputer  network 
designed  with  some  unity  of  purpose  in  mind.  The  processors,  databases,  terminals,  operatlrg  systems, 
and  other  hardware  end  software  components  Included  in  the  system  have  been  interconnected  for  the  accom¬ 
plishment  of  an  identifiable,  common  goal.  That  goal  may  be  the  supplying  of  general-purpose  computing 
support,  a  collection  of  integrated  applications  suoh  as  corporate  management,  or  embedded  computer  sup¬ 
port  such  as  a  real-time  process  control  system. 

This  research  program  is  concerned  with  a  very  specific  subclass  of  all  of  the  systems  currently 
being  designated  "distributed."  The  environment  of  interest  here  has  been  given  the  title  "Fully 
Distributed  Processing  System"  or  FDPS.  Section  2  discusses  the  general  characteristics  of  FDPS's. 


2  imoMoiM  m  rau. i  uisnimzp  prooBsiro  axaaaa 

2.1  Motivation  of  the  FDPS  Concept 

A  large  number  of  claima  have  been  made  ea  to  the  benefits  that  will  be  aobleved  with  distributed 
processing  systems.  As  pointed  out  above,  this  Hat  la  very  similar  to  the  lists  of  "benefits  to  be 
achieved"  with  several  earlier  oomputer  technologies.  However,  each  of  those  earlier  solutions  failed  to 
deliver  its  promises  for  various  reasons.  It  was  an  examination  of  tha  "weaknesses"  in  the  earlier 
concepts  and  the  development  of  a  set  of  principles  to  overooma  these  obstacles  that  lad  to  the  oonoept 
of  "Fully  Distributed  Processing  Systems"  or  as  it  is  commonly  referred  to  "PDFS." 

The  prinaiple  of  parallel  (i.e.,  simultaneous  and/or  concurrent)  operation  of  a  multiplicity  of 
resources  continues  to  be  perhaps  the  most  important  goal.  The  unique  feature  of  FDPS's  la  the  means  or 
environment  in  which  this  is  attempted.  A  distributed  system  should  exhibit  a  continual  Increase  in  per¬ 
formance  as  additional  processing  components  are  added.  The  users  should  observe  shorter  response  times 
as  well  as  an  Increase  in  total  system  throughput.  In  addition,  the  utilisation  of  system  resouroes 
should  be  higher  as  a  result  of  the  system's  ability  to  perform  automatlo  load  balancing,  ssrvloing  a 
large  quantity  and  variety  of  user  work  requests.  A  distributed  aystea  should  also  permit  the  sharing  of 
data  between  cooperating  users  and  the  making  available  of  specialized  resouroes  found  only  on  oertain 
processors.  In  gsnernl,  a  distributed  system  should  provide  more  facilities  and  a  widar  variety  of  ser¬ 
vices  than  those  that  can  be  offered  by  any  systec  composed  of  a  single  processor  [Hopp79].  Another 
Important  and  highly  desirable  feature  of  such  a  system  is  extensibility.  Extensibility  might  be 
realized  in  several  different  ways.  The  system  might  support  modular  and  lnoremantal  growth  permitting 
flexibility  in  ita  configuration,  or  it  might  support  expansion  in  oapaoity,  adding  new  functions,  or 
both.  Finally,  It  might  provide  for  lnoremantal  replacement  and/or  upgrading  of  system  oomponenta, 
either  hardware  or  software.  The  executive  control  of  the  system  is  obviously  the  key  to  attaining  these 
goals,  and  it  is  in  tha  area  of  executive  control  that  soce  of  tha  moat  signifloant  deficiencies  of  ear¬ 
lier  systems  have  been  found. 


M 


Toe  major  KMioNMi  in  the  executive  control  or  earlier  forma  of  parallel  appear  to  raauXt 

froa  an  exoesaive  degree  of  centralisation  of  control  functions  reflected  In  oentrallaad  deolaion  Making 
or  oentrallaad  Maintenance  of  system  statue  lnforMatlon  or  both  of  tbeee.  The  net  effeot  of  these 
aspects  of  control  was  to  produce  a  rather  tightly-coupled  environaent  In  util oh  resources  often  were  Idle 
waiting  for  work  assignments  and  the  failure  of  one  Major  ooaponent  often  resulted  in  catastrophic  and 
total  ayatea  failure.  The  solution  to  this  problee  la  to  foroe  n  condition  of  eery  loose  ooupllng  on 
both  the  lcgloal/oontrol  decision- asking  process  as  well  as  the  physical  linkages  of  ooaponent a.  This 
property  of  'uaivarsal*  loose  coupling  results  In  an  environaent  in  whloh  the  various  ooapoaents  are 
required  to  operate  In  an  autonomous  Manner. 

If  a  single  design  prloolple  Must  be  Identified  as  the  moat  la portent  or  o antral  theme  of  PSPS 
design,  It  la  component  autonomy  or  'cooperative  autonomy"  as  described  below,  ill  of  the  other  features 
of  the  definition  of  Fully  Distributed  Processing  Systems  given  below  have  resulted  from  determining  what 
is  required  to  support  and  utilise  the  autonomous  operation  of  the  very  loosely-coupled  physical  and 
logloal  resources. 

2.2  Dg  Definition  of  |g  PDFS 

Fully  Distributed  Processing  Systems  (FDPS)  ware  first  defies!  by  Enalow  in  1976  [Ensl78]  although 
the  designation  'fully'  was  not  added  until  1978  when  It  beoame  necessary  to  clearly  distinguish  this 
class  of  distributed  processing  froa  the  nany  others  being  presented.  An  FDPS  Is  distinguished  by  the 
following  characteristics; 

1 .  ie.it!  nil  ait*  g£  fmirnMi  an  FDPS  is  composed  of  a  multiplicity  of  general-purpose 
resources  ( e.g. ,  hardware  and  software  prooeasora  that  can  be  freely  assigned  on  a  short¬ 
term  basis  to  various  system  tasks  as  required;  shared  date  bases,  eto.). 

2.  .CflgBBBABfc.  1  atlBBB— flfclfla *  the  active  components  in  the  FDPS  are  phyaloally  interconnec¬ 
ted  by  a  oommunloatlone  network(s)  that  utilizes  two-party,  cooperative  protocols  to 
oontrol  the  physical  transfer  of  data  (l.e.,  loose  physical  coupling). 

3*  Dnitv  of  control ,  the  executive  control  of  an  FDPS  must  define  and  support  a  unified  set 
of  policies  (l.e.,  rules)  governing  the  operation  and  utilization  or  control  of  all 
physical  and  logical  resources. 

k.  System  tnn.M«^np|-  users  must  be  able  to  request  servioes  by  generic  names,  not  being 
aware  of  their  physical  location  or  even  the  fact  that  there  aay  be  multiple  copies  of  thu 
resources  present.  (System  transparency  Is  designed  to  aid  rather  than  inhibit  and, 
therefore,  can  be  overridden.  A  user  who  is  concerned  about  the  performance  of  a 
particular  application  can  provide  system  specific  information  In  order  to  aid  in  the 
formulation  of  management  oontrol  decisions.) 

5.  rn»ryi"..if  mtaiwvi  both  the  logical  and  physical  components  of  an  FDPS  should  Interact 
in  a  manner  described  as  'cooperative  autonomy"  [ClarSO,  Ensl78],  This  means  that  the 
ooaporents  operate  in  an  autonomous  fashion  requiring  cooperation  among  processes  for  the 
exchange  of  information  as  well  as  for  the  provision  of  services.  In  a  cooperatively 
autonomous  control  environment,  the  components  are  afforded  the  ability  to  refuse  requests 
for  service,  whether  they  be  execution  of  a  process  or  the  use  of  a  file.  This  could 
result  in  anarchy  except  for  the  fact  that  all  components  adhere  no  a  cmnon  set  of  system 
utilization  and  management  polioles  expressed  by  the  philosophy  of  the  executive  control. 


2.2.1  Dlaouaslon  of  the  Definitional  Criteria 

In  order  for  a  system  to  qualify  as  being  fully  distributed  it  must  possess  all  five  of  the 
oriterla  presented  in  this  definition. 

2. 2. 1.1  Multiple  Beaourcea  and  Their  lltlllaatlnn 

Thr  requirement  for  resource  multiplicity  concerns  nue  assignable  resources  that  a  system  provides. 
Therefore,  the  type  of  resources  requiring  replication  depends  on  the  purpose  of  a  system.  For  example, 
a  distributed  system  designed  to  perform  real-time  computing  for  air  traffic  control  requires  a  mul¬ 
tiplicity  of  special-purpose  air  traffic  control  prooeasora  and  display  terminals.  It  is  not  required 
that  replicated  resources  be  exactly  homogenous,  however,  they  must  be  capable  of  providing  the  same  ser¬ 
vices. 


In  addition  to  this  multiplicity,  it  la  also  required  that  the  system  resources  be  dynamically 
reoonfigurable  to  respond  to  a  component  failure(s).  This  reconfiguration  must  oocur  within  a  'short* 
period  of  time  so  as  to  maintain  the  functional  capabilities  of  the  overall  system  without  affeotlng  the 
operation  of  components  not  directly  Involved.  Under  normal  operation  the  system  must  be  able  to 
dynamically  assign  its  tasks  to  components  distributed  throughout  the  system. 

The  extent  to  which  resources  are  replioated  can  vary  from  those  systems  where  none  are  replloated 
(not  a  fully  distributed  system)  to  systems  where  all  assignable  resources  are  replioated.  In  addition, 
the  number  of  copies  of  a  particular  resource  can  vary  depending  on  the  system  and  type  of  resource.  In 
general,  the  greater  the  degree  of  replication,  particularly  of  resources  in  high  demand,  the  greater  the 
potential  for  attaining  benefita  such  as  increased  performance  (response  time  and  throughput), 
availability,  reliability,  and  flexibility  [Ensl78]. 


1-5 


2. 2. 1.2  r-TPnnanl  Interconnection  and  Communication 

The  extent  of  physical  distribution  of  resouroes  In  distributed  systems  can  very  fros  the  length  of 
oonneotlcn  between  components  on  e  single  Integrated  ohip  to  the  dlstanoe  between  two  computers  oonneoted 
through  an  international  network.  In  addition,  interconnection  organisations  can  vary  fros  a  single  bus 
to  a  ooaplex  eesh  network.  Since  a  o opponent  in  a  distributed  system  ooaaunloates  with  other  ocaponents 
through  its  own  logloal  process,  all  physical  and  logical  resouroes  can  be  thought  of  as  processes,  and 
interactions  between  resouroes  can  be  referred  to  as  interprocess  ooaaunicatlon  [Devi79).  For  exaap-e, 
an  application  prograa  interacting  with  processors  and  data  files  is  aoooapllahed  through  ooaaunloation 
between  logloal  processes. 

Both  the  physical  and  logical  coupling  of  the  system  components  are  characterised  as  "extremely 
loose.*  "Gated*  or  "master-slave"  oontrol  of  physioal  transfer  is  not  allowed.  Coamunlcation,  l.e.,  the 
physical  transfer  of  messages,  is  accomplished  by  the  active  cooperation  of  both  the  sender  and  addres¬ 
sees.  The  primary  requirement  of  the  intercommunication  aubr  stem  is  that  it  support  a  two-party 
cooperative  protocol.  This  is  essential  to  enable  the  system's  resources  to  exist  in  cooperative 
autonomy  at  the  physioal  level. 

The  advantages  of  using  a  message-baaed  (loosely-coupled)  communication  system  with  a  two-party 
cooperative  protocol  include  reliability,  availability,  and  extensibility.  The  disadvantage  is  the 
additional  overhead  of  message  processing  incurred  to  support  this  method  of  communication.  There  are  a 
variety  of  interconnection  organisations  and  oomunlcatlon  techniques  that  can  be  used  to  support  a 
message-based  system  with  a  two-party  cooperative  protocol. 

2. 2. 1.3  Unity  of  Control 

In  a  fully  distributed  data  processing  system,  individual  processors  will  each  have  their  own  local 
operating  systems,  which  may  or  may  not  be  unique,  that  oontrol  local  resources.  As  a  result,  oontrol  is 
distributed  throughout  the  system  to  components  that  operate  autonomously.  However,  to  gain  the  benefits 
of  distributed  processing  it  is  required  that  the  autonomous  components  of  the  system  oooperate  with  eaoh 
other  to  achieve  the  overall  objectives  of  the  system.  To  insure  this,  the  concept  of  a  high-level 
operating  system  was  created  to  integrate  and  unify,  at  least  conceptually,  the  decentralized  oontrol  of 
the  system. 

A  high-level  operating  system  is  essential  to  successfully  implementing  a  distributed  processing 
system.  This  operating  system  is  not  a  centralized  block  of  code  with  strong  hlerarohioal  oontrol  over 
the  system,  but  rather  it  la  a  well-defined  set  of  policies  governing  the  Integrated  operation  of  the 
system  as  a  whole.  To  insure  reliable  a.id  flexible  operation  of  the  system,  these  policies  should  be 
Implemented  with  minimal  binding  to  any  of  the  system's  components  [Ensl7B]. 

What  policies  are  required  and  how  they  should  be  implemented  depends  greatly  on  the  syster.  For 
example,  if  it  la  a  general-purpose  system  supporting  interactive  users,  then  a  command  interpreter  and  a 
user  control  language  will  be  required  to  make  the  system's  components  compatible  and  transparent  to  the 
user. 

2. 2. 1.4  TrtnapmccncT  aL  fiyatwi  floated 

The  high-level  operating  system  also  provides  the  user  with  his  interface  to  the  dist)  lbuted 
system.  As  a  result,  the  user  is  accessing  the  system  as  a  vhole  rather  than  Just  a  host  computer  in  the 
network. 

In  order  to  increase  the  effectiveness  of  the  distributed  system,  the  aotual  system  is  made 
transparent,  and  the  user  is  presented  with  a  virtual  machine  and  a  simplified  command  language  to  aooess 
it.  The  user  uses  this  language  to  request  services  by  name  and  does  not  have  to  specify  the  speoifio 
server  to  be  used.  Clearly,  the  same  request  might  be  assigned  a  different  server  depending  on  the  state 
of  the  total  system  when  the  request  is  made.  However,  to  make  the  system  truly  effective  for  all  users, 
knowledgeable  individuals  must  be  able  to  Interact  with  the  system  more  intimately,  requesting  speoifio 
servers  or  developing  servioe  routines  to  increase  the  efficiency  or  effectiveness  of  the  system 
[Enel78]. 

2.2. 1.5  Ctn«r>UYB  AklOBOMY 

Cooperative  autonomy  has  already  been  described  at  the  physical  interconnection  level.  It  is  also 
required  that  all  resouroes  be  autonomous  at  the  logical  oontrol  level.  That  is,  a  resouroe  must  have 
full  oontrol  of  itself  in  determining  whloh  requests  it  will  service  and  what  future  operations  it  will 
perform.  However,  a  resouroe  must  also  oooperate  with  other  resouroes  by  operating  aoonrdlng  to  the 
pollolea  of  the  high-level  operating  system.  Cooperative  autonomy  is  an  essential  prerequisite  for 
systems  to  have  fault  tolerance  end  high  degrees  of  extensibility  [Enal?8].  It  la  perhaps  th  most 
important  as  well  as  the  most  distinguishing  characteristic  of  a  fully  distributed  processing  system. 

2.2.2  Kf feats  on  System  Organisation 

Although  tha  detailed  design  of  the  hardware  and  software  required  to  implement  an  FDPS  is  still  In 
progress,  It  has  been  possible  for  some  time  to  Identify  certain  characteristics  that  these  components 
must  have.  One  area  in  whloh  certain  criteria  already  appear  reasonably  well  defined  Is  the  nature  of 
the  organization  of  the  following  system  components; 

-  Hardware 

-  System  control  software 

-  Data  bases 

It  should  be  noted  that  a  number  of  definitions  and  dasorlptions  of  distributed  systems  Jn  general  are 
based  on  the  principle  that  afig  fir  more  of  these  components  is  physically  distributed.  (Some  such 
discussions  add  to  this  list  a  fourth  oomponent  —  "processing  or  funotion;"  however,  considering  the 
distribution  of  processing  independent  from  the  distribution  hardware  is  quite  improper.  Why  distribute 
the  hardware  if  it  will  not  have  some  function  to  perform;  similarly,  how  can  the  processing  be 
distributed  without  a  corresponding  distribution  of  the  hardware?  That  would  be  processing  on  a  truly 
■virtual  machine.") 


1-6 


An  Important  characteristic  of  an  FOPS  la  that,  In  ordar  to  meet  the  definitional  orlterla  given 
above  while  alec  attempting  to  provide  as  many  aa  possible  of  the  benefits  listed  In  Table  1,  all  of  t!*e 
three  components  listed  above  eust  be  nhvslnailv  distributed  and  the  degree  of  distribution  eu»t  in  ea  >h 
case  exceed  g  reasonably  well-defined  threahold.  A  dlagran  illustrating  this  requirement  is  shown  in 
Figure  1.  The  various  organisations  of  each  component,  identified  anj  positioned  along  each  axis,  is  not 
■east  to  be  an  exhaustive  list.  These  points  are  listed  to  better  identify  the  relative  location  of  the 
three  thresholds  defining  the  volume  of  spaoe  oeoupied  by  FOPS' a.  (It  night  also  be  noted  that  it  eeens 
quite  proper  to  characterise  any  eystem  that  is  not  in  the  "origin  cube"  as  being  "distributed"  to  none 
degree.) 

2.2.3  Sore  Excluded  Systems 

Considerable  work  has  been  done  on  new  systea  designs  to  achieve  subsets  of  these  benefits,  but 
very  few  systans  have  aade  substantial  progress  toward  nesting  all  of  the  oriterlr.  Perhaps  the  nost 
widely  known  of  the so  is  Arpanet;  however,  only  the  ooamwnlcation  subsystem  of  that  network  qualifies  in 
this  respeot.  Many  other  systens,  sons  of  which  are  discussed  elsewhere  in  this  aagaslne,  have  aade  sub¬ 
stantial  iaproveaents  in  subsets  of  the  areas  of  system  performance ;  exaaples  are  the  Honewell 
Experimental  Distributed  Processor,  the  Ca*  systea  at  Carnegie-Mellon  University,  Minlnet  at  the  Univeer- 
slty  of  Waterloo  and  ICOPS  at  Brcwn  University.  However,  the  number  of  systems  alslabeled  as  distributed 
data  processing  systems  far  exceeds  these. 

Most  of  the  criteria  contained  within  the  definition  are  net  by  crossing  a  threshold  on  a 
particular  dimension.  The  definition  is  not  a  set  of  binary  orlterla,  and  better  understanding  of  these 
criteria  and  their  thresholds  can  be  obtained  by  considering  some  systems  that  are  excluded  by  the 
definition. 

It  excludes,  for  example,  distribution  within  a  single  aainfraao.  One  writer  has  characterised  the 
architecture  of  several  of  the  aodern  processor  systems  that  include  independent  I/O  ohannela  as 
"incorporating  distributed  processors  sinoe  (it)  contains  sepsrate  I/O  processors,  arithmetic  logic 
prooessors  and  possibly  diagnostic  processors. *[Qlld?6].  Such  a  categorisation  has  little  utility  and 
has  not  found  very  wide  acceptance.  Obviously,  there  is  a  permanent  binding  of  tasks  to  the  various  oom> 
ponents  in  this  type  of  systea  organisation. 

A  front-end  processor  that  oontrols  ooamunioation  with  a  mainframe  definitely  does  not  oonsititute 
the  type  of  distributed  systea  defined  here.  Although  it  aay  meet  some  of  our  criteria,  it  also  is 
dedicated  to  one  funotion  and  is  not  freely  assignable. 

Many  instances  of  a  aaster/slave  relationship  occur  in  both  hardware  and  software  control.  The  key 
point  Is  that  the  recipient  of  the  information  transferred,  be  it  data  or  a  control  signal,  cannot  decide 
whether  or  not  to  accept  the  transfer  and  act  upon  it.  When  this  conoept  is  implemented  in  hardware,  it 
is  often  referred  to  aa  gated  transfer.  In  software  oontrol  systems  the  master/ slave  relationship  is 
quite  conmonly  encountered  in  multiple  oonputer  and  basic  multiprocessor  operating  aystema. 

The  continued  decline  in  the  price  of  hardware  has  aade  norm  and  norm  attractive  new  multiple- 
processor  systea  organizations  incorporating  specialized  functional  units,  such  as  vector  multiplier,  a 
floating-point  arithenatlc  unit,  or  a  fast  Fourier  transform  unit.  In  the  general  ooncept  of  operation, 
suoh  dedicated  funotion  processing  is  only  slightly  different  from  a  aaster/slave  relationship.  The 
major  difference  is  that  the  aaster/slave  control  relationship  also  excludes  many  hardware  systems 
containing  multiple  general-purpose  processing  units  froa  our  definition.  What  causes  some  of  the 
terminology  confusion  with  these  configurations  is  that  these  specialized  servloes  are  often  provided  by 
a  general-purpose  unit,  suoh  as  a  programmable  microprocessor.  The  functional  unit  aay  be  "specialised" 
by  a  microprogram,  or  it  may  be  oompletely  general  but  utilized  in  a  dedicated  functional  role,  suoh  ms  a 
minicomputer  to  oontrol  input/output  in  a  larger  system.  The  distinguishing  oharaoteristlo  of  this  class 
of  excluded  systems  is  the  dedication  tfte  reaouroe  to  a  ulnela  or  a  fixed  set  of  funotlona.  It 
operates  in  a  master/slave  mode,  as  far  as  the  oontrol  over  its  own  activities  is  oonoerned.  The 
criteria  of  both  free  assignment  and  autonomy  are  violated. 

There  la  wide  agreemer'  \ejc_apt  perhaps  among  marketing  and  advertising  people)  that  a  single  host 
processor  with  a  collection  of  remote  terminals  that  simply  collect  and  transmit  data  does  not  qualify  as 
a  distributed  data  processing  system,  even  if  the  terminals  are  intelligent  and  do  some  editing  and 
formatting. 

Even  the  presence  of  multiple  hosts  in  a  complex  network  interconnection  structure  does  not  neoes- 
aarily  make  the  system  distributed.  It  may  be  distributed  from  the  point  of  view  of  switching;  but  from 
the  point  of  view  of  overall  operations  and  oontrol,  it  usually  is  centralised.  Systems  suoh  as  these  do 
not  have  the  capability  for  dynamic  reallocation  or  reassignment  of  tasks  in  the  event  of  hardware 
failure. 

Intelligent  terminals  systems  are  most  often  presented  as  distributed  processing  systems  in 
advertising  copy.  However,  the  operation  of  a  system  with  intelligent  terminals  or  local  processors  has 
to  be  studied  carefully  to  determine  to  what  extent  the  processing  is  aotually  distributed.  Suoh  a 
system  (several  are  commercially  available)  oonslsts  of  several  terminals  connected  to  a  local  processor 
that  has  secondary  storage  capabilities,  suoh  as  disks  or  oasoettes.  It  offers  intelligent  date-entry  or 
field-editing  and  similar  functions  executed  in  the  local  processor  through  the  execution  of  a  program 
stored  there.  It  has  shared  file  access,  but  only  to  local  files.  It  ooMunioates  with  a  main  proces¬ 
sor,  but  to  do  so,  the  local  processor  must  emulate  a  "Comb*  terminal  in  order  to  use  normal  protoools. 
Finally,  it  is  capable  of  remote  Job  entry.  There  is  no  indication  of  any  distribution  of  the  oontrol 
function,  for  the  distribution  of  work  is  fixed  and  a  local  terminal  cannot  affeot  it. 


1-7 


A  terminal  with  a  resident  text  editor,  whether  It  la  provided  by  hardware  or  aoftware,  la  not  an 
exaaple  of  a  dlatributed  data  processing  syataa.  In  order  to  neat  the  definition,  their  te retinal  nuat  be 
"naart"  enough,  first,  to  do  soae  real  work,  and  aeoond,  to  reoognlxe  when  It  cannot  aoooaplleh  lta 
assigned  work  and  to  pass  it  on  to  another  appropriate  service  unit.  The  simple  off-loading  of  work  to  a 
higher  level  when  this  level  is  fully  utilised  Is  Just  tbs  beginning  of  the  transition  to  fully 
dlatributed  processing.  If  the  terminal  coordinates  several  oonourrent  and  siaultaneoua  reacts  Jobs, 
giving  each  a  different  type  ot'  servioe  at  a  different  location,  without  huaan  intervention,  then  it  sore 
closely  ''esaablea  a  distributed  systea.  The  threshold  is  rtiaohed  when  the  local  control  aystea  can 
decide  whether  work  should  be  done  locally  or  passed  on  to  the  rest  of  the  systea,  baaing  Its  deoislon  on 
an  analysis  of  local  workloads  and  capabilities.  Distributed  processing  is  definitely  not  equated  with 
merely  "moving  equipment  to  the  periphery  of  a  business  systea  to  capture  and  process  data  at  the 
source. ■ 

Perhaps  the  Intelligent  terminal  does  have  a  role  to  play  in  the  development  of  dlatributed  proces¬ 
sing  systeas.  It  may  facilitate  a  painless  transition  to  more  decentralized  organisations  for  hardware 
and  data  storage  as  well  ss  control.  This  Is  accomplished  by  adding  features  to  the  looal  systea  and 
making  other  modifications  that  increase  the  local  functions,  prior  to  establishing  higher-level  systea 
connections  «md  a  complete  build-up  of  global  functions. 

2.3  High-level  Qcgfgtlag 

The  high-level  operating  systea  Is  a  key  ingredient  in  the  distributed  data  processing  system.  Its 
design  must  take  into  account  several  characteristics  and  problaas. 

The  clasalcal  design  for  operating  ayateaa,  as  it  has  developed,  assumes  the  availability  of  a 
large  amount  of  aystea  information.  Although  the  completeness  and  validity  of  information  about  the  work 
being  presented  by  the  user  is  questionable,  the  operating  syataa  is  usually  assumed  to  have  aooesa  to 
complete  and  accurate  information  about  the  environment  in  whioh  it  la  functioning.  This  is  not  the  oaaa 
in  a  distributed  data  processing  systea;  complete  information  about  the  system  will  never  be  available. 
The  resources  provide  a  servioe,  but  they  may  either  intentionally  or  unintentionally,  shield  information 
from  outside  Inspection. 

In  distributed  systems,  there  will  always  ba  a  time  daisy  In  the  collection  of  information  about 
the  status  of  the  systea  components.  The  ramifications  of  these  time  delays  are  extremely  Important.  In 
a  conventional  centralized  processor,  the  operating  aystea  cen  request  status  information,  being  eaaured 
that  the  interrogated  component  will  not  change  state  while  awaiting  a  decision  based  on  that  status 
information,  slnoe  only  the  single  operating  system  asking  the  question  may  give  o  maands.  In  a 
distributed  data  processing  system,  the  time  lags  that  occur  can  beoome  significant;  as  a  result,  inac¬ 
curate  (badly  out-of-date)  information  can  be  transmitted  because  the  autonomous  component  r.'oceeds  along 
its  ovi i  path.  If  you  have  ever  worked  with  Input/output  device  handlers,  you  surely  have  wondered 
whether  or  not  the  information  that  has  been  obtained  is  accurate.  For  distributed  data  processing 
systeas,  it  will  be  essential  to  raise  the  degree  of  paranoia  of  tha  systea  designer  to  a  much  higher 
level  than  for  oentralised  ayateaa.  The  systea  aust  be  designed  to  work  even  with  erroneous  or  inac¬ 
curate  status  lnforaatlon. 

A  further  complication  with  regard  to  syatem  lnforaatlon  available  is  the  possibility  of  variations 
j.n  the  lnforaatlon  presented  to  different  systea  controllers.  These  variations  aay  be  e  result  both  of 
time  delays  and  of  differences  in  the  shielding  or  Information  from  different  controllers.  As  Lelann  bes 
observed,  "This  absence  of  uniqueness,  both  in  time  and  In  space,  has  very  Important 
consequences . ■[ LeLa77 ] . 

2.4  ftenerel  Control 

High-level  operating  systeas  as  described  here  are  highly  nonhlerarehlcal  -  that  la,  they  are 

single-level  sad  have  no  Internal  aastar/slava  relationships.  This  oharaoteristlo,  combined  with  com¬ 
ponent  autonomy,  greatly  exaosrbates  the  control  problaas.  Even  if  autonomous  multiple  components  are 
oooperatlng,  the  probability  of  simultaneous  conflicting  actions  Is  much  higher  than  In  blerarohloal 
systeas.  Also,  synchronising  the  actions  of  the  various  controllers  in  the  system  is  nuoh  more 
dlffloult,  because  of  the  presence  of  appreciable  time-lags.  Finally,  the  p.oblea  of  deadlocks  or 
Infinite  oycles  within  tha  aystaa  Is  quite  different  from  that  associated  with  other  systems.  Some 
proposals  oall  for  an  umpire  (an  outside  third  party)  to  solve  this  problea;  however,  such  an  umpire 
would  have  to  be  transient,  sines  the  presence  of  a  permanent  umpire  would  denote  an  unaoosptable  degree 
of  hierarchical  control. 

From  the  operating  characteristics  of  the  distributed  processing  systea,  soae  conclusions  oan  be 
drawn  about  tha  nature  of  systea  coaaunlcatlon.  The  second  oriterlon  of  our  definition  requires  a 
message-type  protocol  for  all  transfers,  both  physioal  and  logical,  both  in  interprocess  ooaaunioatlons 
and  interprooessor  communications.  There  must  be  no  global  variables  and  there  aust  be  no  tunneling 
aorosa  systea  components.  All  parameters  must  be  passed  across  well-defined  and  rigidly  enforced  inter¬ 
faces. 


Much  of  the  work  done  on  oomuilcatio.i  in  uniprocessor  and  multiprocessor  envl>  onaents  Is 
applicable,  but  extensions  to  the  solutions  found  there  are  definitely  required  to  cope  with  the 
autonoaoua  nature  of  the  syataa  components  In  the  distributed  systea. 

Tne  user  aust  oo»  uni  cate  with  the  aystea  by  direotive  containing  servioe  names  only.  Our 
criterion  of  systea  transparency  aakes  unnecessary  and  perhaps  impossible  to  the  user  designation  of  the 
systea  component  offering  a  desired  service.  However,  this  requirement  introduces  new  problems  of  systea 
failure  end  user  error  detection,  slnoe  no  one  processor  can  establish  whether  the  servioe  requested  sen 
be  provided  anywhere  in  the  systea,  or  even  whether  It  la  legal. 


1-8 


Resource  management  in  •  distributed  processing  systeo  is  e  multidimensional  Job.  Thus  far,  very 
little  work  hes  been  done  on  the  aspeats  of  resource  aenegecent  that  apply  speolfioa.Uy  to  distributed 
processing  systeas.  However,  low-level  functions  are  quite  similar  to  those  performed  on  uniprocessors 5 
they  Include  physloal  resouroe  allocation,  and  aanageaent  of  those  faoilities  required  by  a  prooesa  after 
it  has  been  scheduled  on  a  particular  system  oonponent.  Before  that  can  be  done,  however,  the  required 
resources  aay  have  to  be  assembled  at  one  location,  or  linkage  aeohanisas  established  so  that  they  can  be 
used  remotely.  The  problems  that  have  to  be  addressed  in  thet  process  are  locating  the  resources, 
determining  which  components  are  suitable,  and  determining  the  best  way  to  aove  the  resources  to  the 
seleoted  location.  At  an  even  higher  level  is  the  scheduling  problem,  determining  when  a  function  should 
be  initiated  or  terminated. 

Any  systea  exhibiting  monolithic,  autonoaous  control  presents  completely  new  problems  in  systea 
scheduling.  A  request  for  service  in  a  nonhierarohioal  systea  Bight  well  result  in  an  lnK'al  denial  of 
that  service  by  ell  physical  resources.  In  that  instance,  the  requesting  entity  might  initiate  an 
evaluation  of  relative  priorities  between  the  new  request  end  currently  executing  tasks,  followed  nerhaps 
by  bidding  (priority  adjustments)  and  preeaption.  The  efficient  execution  of  this  procedure  is  one  of 
the  most  important  functions  of  the  high-level  operating  systea. 

When  all  of  these  problems  and  their  possible  solutions  ire  compared  to  similar  problems  and 
solutions  encountered  in  uniprocessor  systems,  the  major  Factor  exacerbating  the  distributed  aystea 
control  problem  is  seen  as  communication  within  the  distributed  date  processing  system,  which  is  asynch¬ 
ronous  with  respect  to  the  detailed  execution  of  the  functions,  and  whloh  exhibits  time-lags  in  addition 
to  the  communication  processing  tias  itself.  Uniprocessors  cope  with  aany  of  the  problems  with 
semaphores,  flags,  lookout  gates,  or  timeouts.  To  attempt  to  do  this  in  s  reasonably  complex  distributed 
system  requires  too  much  time,  in  the  sense  that  such  practices  greatly  reduce  the  throughput  rate  of  the 
system.  Bear  in  mind  that  transit  t!.ie  for  signals  transmitting  the  semaphores  is  ,n  the  order  of  100 
milliseoonds.  In  addition  to  the  lowering  of  performance,  the  reliability  and  the  robustness  of  moot  of 
the  uniprocessor  solutions  are  In  doubt,  ainoe  a  system  operation  auoh  aa  TEST-and-SKT  cannot  be 
replicated  aa  a  single  indivisible  machine- level  instruction  that  can  be  exeouted  immediately  ou  the  next 
machine  cycle. 

The  problem  of  time  is  further  complicated  by  the  fact  that  moat  of  the  procedures,  auoh  as  voting 
and  software  synchronisation,  which  have  been  presented  as  solutions  to  the  diffloultlea  Introduced  by 
transit  time,  require  even  more  processing  by  every  component  in  the  aystem. 


2.5  PmgrMBlnr  I-**™*—*  for  Dlatrlhuted  SSMiKU. 

Four  aspects  of  distributed  processing  systems  have  s  significant  impact  on  the  goals  of  s  language 
design  effert.  First,  data  is  stored  throughout  the  system  in  s  distribution  whloh  is  in  some  sense 
natural  (for  example,  data  may  be  stored  where  it  is  generated  or  it  may  be  stored  where  It  is  easily 
accessible  to  those  who  use  it  most  frequently).  Second,  it  may  be  infeasible  to  move  data  from  node  to 
node  for  processing.  Third,  a  single  application  aay  need  to  access  data  that  is  stored  or  a  number  of 
different  nodes.  Fourth,  a  programmer  should  not  need  knowledge  or  where  data  is  stored  In  order  to 
aooess  it. 

It  should  be  noted  that  fully  distributed  processing  does  not  necessarily  require  new  programming 
languages,  much  less  new  models  0.1  whloh  to  base  progressing  languages.  Any  program  can  conceivably  be 
run  on  a  distributed  system;  however,  when  a  program  ieds  to  aooess  data  on  multiple  nodus,  a  single 
thread  of  execution  is  unlikely  to  be  executed  efficiently.  Furthermore,  even  languages  with  parallel 
execution  features  are  not  adequate  in  a  fully  distributed  environment-  The  key  Issue  is  that  moat 
programming  languages  have  not  been  designed  to  allow  a  programmer  to  provide  information  about  the 
nature  of  program  execution  or  to  describe  the  appropriate  structural  units  of  the  program  needed  by  nr 
operating  system  in  order  to  make  effeotive  allocation  and  scheduling  deolaions  in  such  an  environment. 
Thus,  a  major  goal  la  to  clealgn  language  features  thet  will  elicit  information  and  pro  /Ida  atruotural 
units  whloh  will  simplify  allocation  and  scheduling  decisions.  Our  other  major  goal  la  that  in  doing  ao, 
the  language  ahould  present  a  .natural  and  helpful  framework  for  the  description  of  a  large  class  of 
programs. 

The  moat  Important  aspect  of  our  initial  dea:lgn  work  is  the  model  of  computation  on  whloh  our 
language  will  be  bused.  In  order  to  explain  the  motivations  for  our  computational  model,  ve  need  to  take 
a  closer  look  at  *ully  distributed  systems.  Conceptually,  a  fully  distributed  system  oonsista  of  a  num¬ 
ber  of  independent  machined  (where  a  ’machine'  aay  denote  one  or  sore  prooeaaora)  with  communication 
links  between  them.  Each  machine  har,  a  processing  capability,  a  storage  capability  and  a  message  handl¬ 
ing  capability.  Furthermore,  each  machine  functions  with  •  large  degree  of  autonomy  (the  system  as  a 
whole  aay  make  requests  of  the  individual  machines  but  has  no  control  over  how  these  requests  are  oarrled 
out)  and  there  is  no  memory  shared  between  the  machines. 

In  order  to  achieve  our  design  goals,  our  ocmputational  model  mimics  the  logloal  structure  of  e 
fully  distributed  system.  In  this  model,  a  program  consists  of  a  numbei  of  execution  modules  (whloh  can 
can  even  be  thought  of  aa  individual  programs)  and  a  network  deccrlption.  In  other  words,  a  program  la  a 
network  or  execution  modules.  The  execution  modules  are  independent  and  oontain  looal  variable 
declarations  (there  is  no  shared  memory),  port  declarations  end  executable  oode.  (Porta  ere  used  to  com¬ 
municate  with  other  execution  modules  and  permanent  storage  facilities.)  Port  to  port  connections,  port 
to  permanent  storage  connect  lone  end  the  execution  of  the  excoution  modulus  are  oontrolled  by  the  network 
description.  Thus,  the  execution  modules  (because  they  resemble  the  nmohines  of  s  fully  distributed 
system)  are  atruotural  units  which  will  elmpliry  allocation  and  scheduling  decisions.  On  the  other  hand, 
the  network  description  oontiins  information  whloh  will  enable  an  aff active  allocation  and  scheduling  of 
theaa  units. 

Two  aspects  or  our  nodal  should  be  helprul  in  the  writing,  debugging  end  reeding  of  programs. 
First,  because  execution  modules  are  independent  (sharing  only  coumunicstlon  links  and  a  oomaon  network 
description),  they  may  be  developed,  tested  and  understood  separately.  Second,  because  all  of  tha 
control  and  network  communication  specifications  are  contained  within  it,  the  network  description 


1-9 


provides  a  meaningful  abatraot  view  of  the  program  ms  a  who.. a.  This  oontrol  and  oomnunication  abatrac- 
tion  should  oontrlbuts  significantly  to  ths  undorstandabllity  of  distributsd  programs  and  perhaps  svsn  to 
programs  written  for  existing  system  organisations  (not  masting  our  criteria  for  'fully'  distributsd). 

2.5.1  Rossaroh  Issuss  in  'Distributed'  Language  Dsslgn 

Communication  primitivss  in  languages  for  distributsd  computing  are  one  of  ths  moat  important 
issues  that  should  be  addressed  by  the  participants  at  the  workshop.  The  most  obvious  alternatives 
inoludo  message-based  communication  and  call-based  oommunloatlon  (for  example,  the  rendesvous  in  Ada). 
However,  the  potential  for  tbs  network  description  presented  above  to  be  an  motive  program  unJ  t  opens  up 
new  possibilities.  It  could  funotlon  as  a  communication  controller  for  its  execution  modules,  providing 
any  hybrid  communication  primitives  desired  by  t  programmer.  The  utility  of  such  an  approaoh  requires 
further  examination.  Due  tc  cur  inloreot  in  very  loosely  coupled  systems,  messages  are  the  moat  likely 
candidate  for  our  language  currently  being  developed.  However,  for  systems  where  loose  ooupling  is  not  a 
dominant  consideration,  use  of  communication  primitives  implemented  by  a  network  controller  might  prove 
quite  useful.. 

Another  important  issue  raised  by  the  model  of  programs  presented  above  is  how  distributed  programs 
are  to  be  described  and  controlled.  A  program  might  quite  reasonably  be  aomposed  of  exeoution  nodules 
oomplled  and  stored  at  different  nodes  in  a  netwok,  conceivably  composed  cf  heterogeneous  processor o,  end 
perhaps  even  written  in  different  languages.  These  questions  lead  into  a  number  of  subproblems:  con¬ 
ventions  for  naming  files  throughout  a  distributed  system,  interactions  between  programming  languages  and 
oommand  languages  (note  that  our  network  descriptions  fall  somewhere  between  the  traditional  roles  for 
these  two  languages),  and  primitives  needed  by  a  programmer  for  the  oontrol  and  coordination  of  mul- 
tiprooess  exeoution. 


3  CQMCL3SIQHS 

The  oonoepts  of  distributed  data  processing  clearly  hold  a  great  deal  of  proniae  for  solving  many 
of  the  problems  ana  limitations  currently  faced  by  system  designers.  It  la  important,  however,  to  make  a 
orltioal  analysis  of  the  operational  characteristics  of  any  system  that  is  addressing  those  issues.  This 
pap.ir  has  reported  on  such  effort  —  The  Georgia  Institute  of  Technology  Research  Program  in  Fully 
Distributed  Processing  Systems. 


A  AC^OWLKDr.BCMfa 


The  work  disoussed  in  this  paper  was  performed  as  part  of  the  Georgia  Institute  of  Technology  Researoh 
Program  in  Fully  Distributed  Processing  Systems.  Twelve  faoulty  members  have  been  involved  in  this  major 
research  program  as  well  as  four  staff  members  and  over  thirty  students.  The  progrsm  has  been  supported 
by  s  number  of  agencies  with  the  principle  funding  being  provide,  by  the  Offioe  of  Naval  Rossaroh,  U.S. 
Navy,  under  cent: act  NOOO1A-79-C-O073. 

5.  BUFRBKMCKS 


Davi79  D  vies,  D.  W.,  Barber,  D,  L.  A.,  Price,  W.  1.,  and  Solomor'.des,  C.  M.,  Computer  Networks  and 

Thai.  Protocols.  John  Wiley  and  Sons,  1979. 

?nal7*  Enelow,  Philip  K.  Jr.  (ed.),  Multiprocessors  and  Parallel  Processing.  New  York:  John  Wiley  and 
Sons,  1974. 

Bnsl78  Enulow.  rhillp  H.  Jr-,  ''What  is  a  'Distributed'  Data  Processing  System?"  Computer  (January, 

1978):  13*21 . 

Ensl8 1  Enslow,  Philip  H  Jr  ,  "Distributed  Data  Processing  — -  What  Is  It?,"  AOARD  Avionics  Panel  Sym¬ 
posium  on  "Taotlcal  Airborne  Distributed  Computing  end  Networks,"  Norway,  (June  22-26,  1981). 

011d77  G  lder,  Jules  H.,  "Distributed  Prooeasing:  Keyword  for  Tomorrow's  Supercomputers,"  Computer 
rtUJloaa.  (April,  1976):  It. 

Hopp?9  Hopper,  K.,  Bugler,  H.  J.,  and  Unger,  C.,  "Abstract  Machines  Modelling  Network  Control 

Systems."  Operating  Systems  Review  1?  (Jaiuary,  1979):  10-24. 

Lein58  Leiner,  A.  L, ,  and  Weinberger,  A.,  "PILOT,  the  NBS  Multicomputer  System,"  Proceedings  of  the 
Eaatern  .Joint  .tomputer  Conference  1 10181:  71-75. 

LeLan77  Le  Linn,  Gerard,  "Distributed  Systems— Towards  a  Formal  Approaoh,"  IFIP  Congress  Proceedings 
(1977):  155-160. 


Table  i.  "Baaaflte*  Provided  by  Distributed  motuliii  Systems 


A  Representative  Llat  Aseeabled  fro*  Clelas  I'ede  Id 
Aotual  Solas  Literature 


High  Availability  and  Reliability 
Reduced  Netvork  Costs 
High  Syataa  Performance 
Pact  Response  Tlas 
High  Throughput 

Oraoaful  Degradation,  Pail-soft 
Ease  of  Hodular  and  Inoraaental  Growth 
Configuration  Flexibility 
Autoaatlo  Load  and  Rauouroe  Sharing 
Easily  Adaptable  to  Changes  in  Workload 
Inoraaental  'iplaoeaent  art.:  ir  Upgrade 
Easy  Expansion  in  Capaolty  and/or  Function 
Good  Response  to  Taaporary  Overloads 


Figure  1 .  Axes  of  Dlsti  ibutlon 


3-1 


THE  EFFECT  Of  INCREAS  IMPLY  HOBS  COMPLEX  AIRCRAFT  A  HD  AVIONICS 
CM  THE  METHOD  OF  SYSTEM  DBS ION 
J.T.  MARTIN 

FERRANTI  COMPUTER  StSTEMS  LIMITED 
Western  Rend,  Bracknell,  Berkshire,  England. 


SWART 


This  pspsr  desortbes  ths  svolutlon  of  Alrorsft  and  their  asaoolatad  ' Avlonloa'.  Tha  evolutionary  progress 
Is  oonsldarsd  as  starting  fro*  a  slapla  low  sptad  Alroraft  with  rudimentary  flight  instruments  and  sighting 
systems ,  through  tha  lntaroonnaotion  of  sons  of  thaao  ay  at  am  and  progressing  to  tha  raoant  Avionlo  Systems 
with  Cantrallsad  Digital  Computing. 

Tha  paper  shows  how  ths  changes  in  alroraft  systems,  fro*  tha  slapla  analog  connection  of  a  few  systaaa, 
through  tha  analog  sansor  -  intarfaoa  box  -  oantralisad  digital  systaa,  to  tha  sensor  produolng  digital 
outputs  -  intarfaoa  box  -  oantralisad  digital  systaa,  have  produced  comparatively  small  ohangas  In  tha 
aathodology  used  for  tha  design  of  these  systaaa. 

The  move  to  aystaas  containing  distributed  processing  interconnected  by  digital  highways  Is  shown  to  b* 
revolutionary  rather  than  evolutionary  and  to  require  a  new  approaoh  to  tha  Systaa  Design  problem  so  as  to 
reap  the  aaxlaua  advantage  from  the  available  computing  capability. 

1.  INTRODUCTION 

The  first  alroraft  used  in  war  were  aeon  not  as  fighting  vehicles  but  as  information  gatherers,  especially 
tor  tha  artillery,  indeed  the  ability  of  tha  aircraft  to  fight  was  scorned  by  tha  Oenarala  who  oontrolled 
the*  and,  In  tha  beginning,  Ignored  by  those  who  designed  the*.  The  main  pre-oooupatlon  being  tha 
production  of  as  stable  a  platfora  as  possible. 

Aerial  war faro  started  In  two  ways.  For  air  to  air  attaok  tha  standard  service  revolver  was  uaad  and  for 
air  to  surface  attack  standard  army  grenades  were  siaply  thrown  frea  tha  oookplt.  No  attaok  avionics  were 
Involved,  tha  sighting  systemi  being  tha  barrel  of  tha  hand  held  revolver  or  the  pilot's  Impression  of  his 
position  over  tha  target.  It  Ir  from  these  rudimentary  beginnings,  less  than  70  years  ago,  that  today's 
highly  sophisticated  fighting  alroraft  have  evolved.  Todny  we  have  alroraft  specifically  designed  for 
either  air  to  air  or  air  to  surfaae  attaok,  aircraft  where  the  oost  of  the  attaok  avionics  approaches  that 
of  the  alrfraae  Itself  and  on  whioh  the  attaok  avionics  seeks  to  Integrate  information  froa  aost  of  the 
available  alroraft  systems  and  sansora. 

The  following  aeatluia  of  this  paper  will  describe  this  evolution  in  slightly  more  detail  and  show  how  the 
Systaa  Design  prooess  has,  up  to  now,  been  modified  only  slightly  In  order  to  oope  with  the  lnoreased 
complexity. 

2.  T'.ffi  EVOLUTION  OF  AIR  TO  AIR  ATTACK 

It  is  hardly  surprising  that  pilots  engaging  in  air  to  air  oombat  utilising  service  revolvers  or  rifles 
rarely  suooeeded  In  shooting  down  their  targets.  The  frustration  engendered  ty  this  failure  of  tholr  air  to 
air  weapon  system  together  with  a  sudden  realisation  by  those  in  charge  that  if  alroraft  were  useful  to 
them  In  an  observation  role  then  they  must  be  equally  as  useful  to  the  opposition,  and  should  perhaps  ts 
deterred,  led  to  the  requirement  for  a  more  effective  weapon  system. 

The  more  effective  weapon  system  become  machine  guns  either  loosely  mounted  on  the  airorai' ;  an*  aimed  by  an 
observer  using  a  sight  fixed  to  the  gun  or  a  maahlne  gun  rigidly  mounted  on  the  aircraft  ar.d  aimed  by  the 
pilot  aiming  the  whole  aircraft,  and  henoe  the  gun,  utilizing  a  simple  ring  and  bead  sight  mounted  on  the 
alroraft. 

Although  a  speotaoular  increase  in  suooess  rate  was  achieved  by  these  method i  it  was  clear  that  further 
improvements  could  be  made.  However,  this  was  how  the  191*1-1918  air  war  was  tv  iduoted. 

The  simple  ring  and  bead  eight  suffered  form  two  main  disadvantages,  firstly  the  parallsx  ef.’ort  Inherent  In 
attempting  to  traok  a  target  with  a  mechanical  sight  some  inohes  from  the  eyes  created  large  errors, 
seocndly  firing  st  a  moving  target,  the  opposing  aircraft,  means  that  the  V -lists  must  be  fired  not  at  tha 
target  but  at  the  position  that  the  target  will  be  st  when  the  bulleti.  a-’-J  i. 

These  souroes  of  error  were  reduced  by  the  use  of  the  gratical  sight  whioh  both  removed  the  souroe  of 
parallel  error  and  allowed  some  degree  of  'aiming  off  although  the  scour,  oy  of  this  latter  prooess  depended 
to  a  high  degree  on  the  quality  of  the  pilot's  estimation  of  the  speed  and  attitude  of  the  target  In 
relationship  to  his  own  alroraft. 

The  first  real  sign  of  avlonlos  in  gunslghta  did  not  appear  until  the  experimental  Qyrosooplo  Lead  Computing 
Optloal  Gunsight  appeared  In  19**0,  entering  aervioe  as  tha  003  Mk.  2  In  1942 .  This  sight  used  s  , qyrosooplo 
sensing  unit  and  enabled  an  automatic  computation  of  tha  required  lead  angle  based  on  a  measure  of  the  rate 
of  turn  of  the  sight  line  (measured  by  the  gyrosaope),  the  range  (eatlaated  froa  the  pilot's  appreciation  of 
the  target  size  and  the  velocity  of  the  bullet  or  shell  (fed  Into  the  gunsight  as  a  design  parameter). 


The  'avion las'  associated  with  the  above  gunsight  was  In  fact  amazingly  simple,  consisting  of  a  gyroscope  to 
anabla  tha  computation  of  laad  angle  and  non-linear  variable  resistors  to  allow  tha  pilot  to  antar  rang*  and 
thus  tha  gravity  fall  to  ba  expected  to  ba  experienced  by  tha  bullats  or  shall  a  during  thalr  trajectory. 
Howavar,  for  tha  first  tlaa  an  alroraft  sansor,  albalt  ona  spaolally  addad  Into  tha  gunsight,  was  balng  ussd 
In  tha  air  to  air  waapon  systaa. 

Ranga  of  targat  was  still  balng  antarad  Into  tha  systaa  from  a  subjective  appraolation  suppllad  by  ths 
pilot.  In  1949  this  requirement  on  tha  pilot  to  provide  suoh  Information  was  raaovad  by  tha  advant  of  radar 
ranging  systaas  wharaby  a  targat  ranga  suppllad  by  radar  ratums  oould  ba  insartad  diraotly  Into  tha 
gunaight. 

Froa  this  point  systems  quiokly  emerged  wharaby,  as  tha  targat  was  hald  and  traoked  on  tha  radar  and  tha 
naoassary  adjuataants  for  laad  angal  ooapanaatlon  and  ralatlva  position  of  targat  and  attaoklng  alroraft 
war<  oarriad  out  by  alaotronlo  ooaputation,  tha  pilot  no  Ion gar  had  to  ba  abla  to  saa  his  targat  In  ordar  to 

an  gaga  It. 

At  this  stage  the  air  to  air  attaok  systaa  aould  be  considered  to  have  been  produced.  Alroraft  sensors 
supplying  lnforaation  regarding  tha  notion  of  tha  attaoklng  alroraft  coupled  with  tha  sansor  Information 
(froa  the  radar)  on  tha  targat,  produoe  for  a  pilot,  who  aay  not  even  ba  able  to  saa  his  Intended  targat,  tha 
naoassary  ouas  to  anabla  his  attaok  to  ba  promulgated. 

3.  Tha  Evolution  of  Air  to  Surfaoa  Attaok 

In  tha  same  way  that  tha  hand  hald  revolver  proved  fairly  Ineffective  for  air  to  air  attaok,  tha  hand  thrown 
boab  also  proved  to  ba  soaawhat  lass  than  perfect.  Thera  ware  two  main  reasons  for  this  lack  of 
aff eotlvenesa ,  firstly  it  was  soaewnat  unlikely  that  tha  pilot  or  observer  would  accurately  hit  tha 
intended  targat,  having  nothing  batter  to  oaloulata  a  release  point  with  than  his  Judgement  of  tha  traok  of 
tha  alroraft  and  an  lapra&'.lan  of  the  drop  characteristic  of  tha  waapon.  Secondly  tha  faot  that  a  boab  that 
is  capable  of  balng  picked  up  and  thrown  froa  the  ooakplt  is  somewhat  Halted  In  site  and  thus  affaotivanesa 
upon  reaching  tha  ground. 

The  seoond  of  these  pro bless  was  ooaparatlvely  easy  to  solve  -  larger  boobs  attaohed  to  ths  alroraft  which 
fall  off  upon  tha  application  of  soae  fora  of  release  ooanand. 

Tha  first  problem,  that  or  producing  tha  release  pulse  at  the  oorreot  time  Is  not  quite  so  simple.  Tha  path 
followed  by  the  boab  as  It  falls  will  baalaally  depend  on  the  flight  oharaoterlstlo  of  the  weapon,  the 
veloolty  of  the  alroraft  (and  henoa  Initial  velooity  of  the  weapon)  at  the  point  of  release  and  the  dlstanoe 
that  tha  weapon  must  fall  (the  height  of  the  alroraft  at  release) .  Thus  occurs  the  same  type  of  progression 
as  for  gunslghta,  we  move  froa  the  simple  mechanical  sights  of  early  alroraft  to  the  oomplex  release  point 
calculating  computers  which  are  supplied  with  target  position  (from  radar  or  laser  seeker),  alroraft  height 
(rrom  altimeter),  ground  spaed,  air  speed  and  alroraft  traok  and  heading  (from  air  data  computers). 

4.  OTHER  ST3TBH3 

So  far  only  the  ooaparatlvely  simple  problem  of  gunslghts  and  boabalghts  have  been  considered  and,  although 
these  are  undeniably  Important  parts  of  any  aircraft  waapon  dyrtems,  It  is  evidently  apparent  that  there  Is 
no  point  In  having  these  systems  If  the  alroraft  oannot  be  positioned  In  the  right  piece  at  the  right  time 
so  as  to  be  able  to  use  them.  It  Is  thus  worthwhile  to  consider  soae  of  the  other  major  system  ooaponents 
required . 

9.1  Navigation  System 

Early  alroraft  had  very  limited  ranges  and  speeds.  In  the  oase  of  the  artillery  observation  alroraft  It 
could  often  see  Its  own  trenohsa  and  navigated  by  flying  froa  one  visual  landmark  to  another,  landing  in  a 
convenient  field  If  it  did  lose  its  way. 

With  Increasing  range,  speed  and  landing  weight  it  became  necessary  to  be  able  to  navigate  to  a  target  or 
area  and  baok  to  a  soaawhat  more  prepared  landing  strip  than  tha  nearest  convenient  field.  Initially  it  was 
possible  to  achieve  this  objective  by  oontlnuing  to  use  the  visual  landmark  with  perhaps  a  good  idea  of  in 
whioh  direction  the  sun  should  be.  Two  faotors  spoilt  this  happy  state  of  affairs.  Firstly  the  improved 
performance  and  gunslghts  of  the  fighter  alroraft  deoreed  that  the  boa bar  should  fly  at  night  and  secondly 
somebody  deoided  that  flying  should  not  ba  solely  a  fair  weather  occupation. 

Thus  began  the  two  main  methods  of  alroraft  navigation  -  radio  aids  and  alroraft  position  dead  reckoning. 

Radio  aids  for  navigation  hava  progressed  froa  the  simple  bearing  froa  a  controller  enabling  a  returning 
fightar  to  be  veotored  baok  to  its  base,  through  ths  radio  highways  produced  for  bombers  by  such  systems  ar 
Knlckebeln,  Wotan  and  Oboe  to  the  sophisticated  position  fixes  supplied  by  systeas  using  Omega  and 
eventually  Navstar.  Amongst  these  radio  elds  can  also  ba  00 unted  tha  ground  mapping  radars  introduced  in 
the  1939-45  war  and  progressively  Improved  ever  since. 

The  dead  reckoning  aids  lnoluda  suoh  systeas  as  Integrating  air  data  ooaputera  and  of  aourse  Inertial 
Navigation  systems  whose  aoouraclas  i no rasa*  almost  yearly. 

4.2  Coasiunioation  Systeas 

Communications  both  between  alroraft  and  tha  ground  and  between  aircraft  hava  progressed  somewhat  in  the 
last  65  or  so  years.  We  have  moved  from  the  artillery  airoraft's  different  coloured  Very  lights  and  the 
pilot's  arm  Indicating  a  potential  target  to  another  pilot,  through  the  radio  with  channels  A  to  B,  to  the 
ooaplax  array  of  VHF,  UH  and  HF  ohannels  availabel  to  a  modern  pilot  and  his  orew. 


3-3 


It  is  now  possible  for  the  pilot  or  orsw  of  one  eiroreft  to  aslsot  a  target  for  sttsok  and  for  that  target  to 
be  autoaatioally  indicated  to  the  pilot  or  orew  of  another  airoraft  via  a  digital  data  link  completely 
automatically  and  without  a  word  having  been  spoken.  (Aa  an  aside  it  is  also  interesting  to  note  that  with 
the  advent  of  JTIDS  we  have  reverted  to  the  line  of  sight  range  of  the  original  Very  signal,  but  at  least  the 
information  rate  lias  been  increased). 

4.3  Pilot's  Aids 

From  the  above  very  brief  summary  of  the  advanoes  made  in  airoraft  weapon  ai.d  supporting  systems  it  oan 
easily  be  seen  that  the  work  load  of  the  pilot  or  crew  of  the  modern  airoraft  has  lnoreased  enormously  over 
that  enjoyed  by  his  historloal  counterpart.  If  we  add  to  this  list  suoh  systems  as  ESM,  ECU,  ECCM,  the  faot 
that  the  airoraft  is  now  flying  faster,  that  the  airoraft  is  probably  the  target  of  attacking  missiles,  that 
it  is  firing  missiles,  that  air  engagements  between  airoraft  may  bo  measured  ip  periods  of  seoonds  we  oan 
very  qulokly  see  that  the  pilot  or  orew  oan  do  with  any  help  that  they  oan  get. 

Avionlos  oan,  if  used  intelligently,  solve  some  of  these  problems  whloh,  to  a  certain  extent,  it  has  helped 
to  oreate.  Computers  oan  be  used  to  seleot  that  information  whloh  the  pilot  needs  to  know,  cathode  ray 
tubes  oan  be  used  to  display  oonneoted  information  together,  or  display  the  moat  important  advisory  notloes 
always  in  the  same  plaoe  am'  not  spread  around  the  oookplt,  multifunction  keyboards  oan  raplaoe  banks  of 
swltohes  (some  of  whloh  always  inevitably  seemed  to  end  up  in  ergonloally  bad  positions)  and  also  provide 
prompts  as  to  the  information  or  aotions  required  from  the  pilot  or  orew.  It  is  at  this  stage  that  System 
Design  should  oonmenoe. 

5.  THE  AVIOHIC  SYSTEM  AND  ITS  DESIGN 

As  oan  be  seen  from  the  above  descriptions  of  the  evolution  of  avionlos  on  airoraft  the  early  sub-systems 
gunsights,  boobsights,  navigation  eto.  did  not  form  an  overall  system  nor  were  they  designed  to  do  so.  For 
instance,  in  section  2  above,  we  saw  how  when  rate  of  turn  was  required  to  be  supplied  to  the  QOS  Kk.  2 
gunsight  the  airoraft  sensor,  the  gyrasoope,  was  added  to  the  gunslght  producing  a  self  oontafned  unit. 

The  next  step  was  for  one  sub-system  to  supply  to  another  some  particular  oleoe  of  information  required  by 
the  reoiplent  sub-system,  range  from  the  radar  being  supplied  to  the  gunslght,  for  lnstanoe. 

Certainly  up  to  this  point  System  Design  was  oonoemed  with  ensuring  that  the  various  sub-systems  on  an 
airoraft  worked  satisfactorily  but  the  total  Avionics  was  still  a  very  loosely  ooupled  oolleotlon  of 
separate  sub-systems  rather  than  being  designed  as  a  total  system.  The  interconnection  of  sub-systems  was 
tenuous  to  say  the  least  and  agreement  about  particular  interfaces  between  two  sub-systems  could  be,  and 
was,  made  without  consideration  being  taken,  or,  to  be  fair,  needing  to  be  taken,  of  the  other  airoraft  sub¬ 
systems.  As  long  as  the  synchro  outputs  from  one  sub-system  matohed  the  orientation  of  the  inputs  to  the 
other  and  the  voltages  produoed  and  read  at  the  ends  of  the  connection  were  agreed  as  to  their  meaning  then 
that  was  generally  the  end  of  the  System  Design  task. 

Thus  were  produced  systems  suoh  as  that  shown  in  extremely  simplified  form  if  figure  1.  Information  is 
passed  from  one  sub-system  to  another  as  required  and  as  agreed  by  the  sending  and  receiving  parties.  If 
transformation  of  the  date  in  terms  or,  for  lnstanoe,  units  was  required  then  it  was  generally  oarried  out 
as  required  for  eaoh  sub-o;jtem  and  many  different  transformation  of  units  might  be  oarried  out  for  one 
particular  piece  of  data  depend!  upon  which  sub-system  it  was  being  sent  to  or  reoelved  by  -  the 
trai-'  ;‘<  mat  ion  being  oarried  out  ly  '•he  sub-system  least  unable  to  oope  with  the  additional  work. 

luring  the  1960s  aigital  computers  became  available  to  carry  out  some  f  the  computations  necessary  within 
the  respective  sub-systems,  however,  due  to  their  site  and  oost  they  could  not  effectively  be  added  to  any 
sub-system  that  required  to  carry  out  a  oalouiation.  This  led  to  the  oonoept  of  a  small  number  of 
oentralized  (from  the  systam  uint  of  view)  computers  receiving  data  from  sensors  or  sub-systems,  oarrying 
out  the  tv -tssary  oaloulaiions ,  and  then  foeding  the  results  to  the  sub-systems  that  required  the  results. 
Unfortunately  as  the  Interfaces  to  the  sensors  and  sub-systems  still  tended  to  be  analog  In  nature  and  aa 
the  units  used  by  one  sub-system  were  unlikely  to  match  those  required  by  another  a  large  part  of  the 
oentralized  computer  task  was  taken  up  by  performing  analog  to  digital  and  digital  to  analog  conversions 
and  in  performing  digitally  the  neoessary  furlongs  per  fortnight  to  knots  unit  conversions. 

Figure  2  shows  an,  again  extremely  simplified,  example  of  suoh  *  system.  Unfortunately  the  system  design 
techniques  used  for  the  loosely  coupled  system  desoribed  above  still  tended  to  be  used  for  the  production  of 
this  type  of  system.  Sensors  and  sub-systems  produoed  those  parameters  whloh  were  demanded  of  them  and 
demanded  those  parameters  whloh  they  needed.  Both  sets  of  parameters  being  produoed  or  demanded  in  the 
format  and  units  most  easily  handled  by  the  sub-system,  with  the  resultant  lnteroonneotion  tangle  being 
left  to  be  sorted  out  by  the  centralized  computer  and  its  associated  interface  adapters.  Thus  the  computer 
ended  up  as  being  the  go  between  for  any  two  potentially  connecting  systems  rather  then  the  produoer  of  a 
unified  total  system. 

The  next  stage  in  the  evolution  of  avionlos  produoed  some  sub-system  or  sensors  containing  digital  Inputs  or 
outputs  instead  of  the  older  analog  Interfaces.  This  allowed  some  of  the  analog  to  digital  and  digital  to 
analog  adaptors  to  be  replaoed  by  digital  to  digital  adaptors,  a  not  very  encouraging  step.  The  problem,  of 
aourse,  was  one  of  standardization.  Every  manufacturer  or  project  hed  its  own  pet  digital  Interface  end 
somehow  it  was  always  the  incompatible  ones  that  were  trying  to  get  together. 

6.  TOWARDS  THE  ’PERFECT'  SYSTEM 

During  the  1970a  two  very  important  things  happened.  The  digital  computer  became  both  small  and  affordable 
and  Mil.  Std.  1553  was  created  and  won  a  large  measure  of  acceptability. 


34 


Oivtn  a  digital  intar face  an  Joying  widespread  aooaptanoe  the  inter faoe  adaptors  oould  be  put  aside.  More 
importantly,  given  a  digital  bus,  all  sub-systems  are  automatically  connected  to  all  other  sub-ayatems 
requiring  interoooBunloatlon  without  having  to  worry  about  the  oomplexlty  or  cost  of  produolng  a  speoial 
link  for  an  information  path  whloh  although  desirable  is  on  the  faoe  of  it  not  positively  essential. 

In  the  same  way,  given  that  moat  sub-systems  and  sensors  can  now  be,  and  are  now  being,  supplied  with  their 
own  digital  computational  faoilities  it  is  possible  for  these  sub-systems  to  supply  the  information  that  lo 
required,  in  the  format  that  is  required,  to  the  sub-systems  that  require  it. 

Suddenly  information  oan  be  supplied,  in  the  oorrect  format,  as  and  when  required.  Many  of  the  old  system 
design  constraints  have  been  removed,  reversion  oan  be  supplied  not  by  the  old  back  up  sub-system  approach 
but  by  re-configuring  the  system  data  flow.  The  prooesaiong  power  limitation  of  the  old  centralized 
oomputer  oan  be  forgotten,  but  hov  do  we  design  the  system.  It  is  no  longer  possible  for  two  sub-system 
designers  to  come  to  a  gentleman's  agreement  about  the  data  to  be  passed  between  them  in  terms  of  format  and 
repetition  rate.  Now  every  pieoe  of  data  produoed  by  one  sub-system  is  potential  information  for  every 
other  sub-system. 

With  a  sytem  of  the  type  as  shown  in  figure  3  the  main  limitation  is  the  ability  of  the  System  beslgn 
Methodology  to  cope  with  the  design  of  the  system  not  of  the  individual  sub-systems  to  ocpe  with  their 
tasks. 

7.  CONCLUSION 

It  is  now  possible,  perhaps  for  the  first  time  ever,  to  fully  integrate  an  Avionic  system  and  to  provide  a 
means  whereby  all  the  necessary,  rather  than  essential,  Information  paths  oan  be  provided. 

We  saw,  in  the  example  of  the  gunsight,  for  instanoe,  how  in  the  past  sub-systems  have  been  oonneoted 
together  so  as  to  provide  only  the  essential  information  required  within  the  sub-system  but  in  isolation  to 
the  remainder  of  the  total  system.  Sven  with  the  advent  of  the  oentralised  computer,  whether  oonneoted  to 
the  remainder  of  the  system  by  analog  converters  or  disorete  digital  links,  the  total  system  has  tended  to 
be  made  up  of  a  number  of  sub-systems  with  the  oomputer  soting  as  the  lnterfaoe  device  between  sub-systeme 
and  nerving  the  needs  of  oonneoted  sub-systems  rather  than  providing  an  overall  Integrated  system. 

Throughout  this  period  the  task  of  system  design  has  been  that  of  produolng  compatible  interfaces  between 
one  sub-system  and  another  and  attempting  to  produce,  with  these  collections  of  sub-systems,  a  final 
product  that  approximates  to  the  original  requirements  of  the  customer  and  Intentions  of  the  designer. 
Given  the  facts  of  both  distributed  computing  and  sub-system  interconnected  by  a  common  highway,  this 
rather  aimpllatio  (although  often  far  from  simple)  approaoh  to  system  design  oan  no  longer  oope  with  the 
problem  to  be  handled. 

To  reap  the  advantages  that  can  be  gained  from  a  system  built  using  todays  available  technology  requires 
that  the  system  design  task  must  be  ooonenced  from  the  viewpoint  of  the  customer's  requirement  and  then 
broken  down  into  the  sub-systems  required  to  produce  the  end  result.  It  is  no  longer  possible  to  arbitrarily 
assign  tasks  to  sub-systems  without  having  oonsidered  the  effeot  of  the  assignment  on  the  total  system.  The 
methods  used  for  system  design  oust  be  able  to  oope  with  the  task  of  designing  the  total  system  as  a  unit 
rather  than  a  oolleotion  of  sub-systems.  It  is  only  in  this  way  that  full  advantage  oan  be  taken  of  the 
aomputlng  power  that  is  potentially  available  in  the  modern  avlonloa  system. 


AIRCRAFT  SEN80RS 


ALTITUDE 


WEAPON  SYSTEM 


4-1 


A  TUTORIAL  ON  DISTRIBUTED  PROCESSING 
IN  AIRCRAFT/AVIONICS  APPLICATIONS 

BERNARD  A.  ZEMPOLICH 

DEPUTY  TECHNOLOGY  ADMINISTRATOR  FOR  COMMAND,  CONTROL  AND  GUIDANCE 
RESEARCH  AND  TECHNOLOGY  GROUP 
NAVAL  AIR  SYSTEMS  COMMAND,  WASHINGTON,  D.  C.  20361 

SUMMARY 

The  purpose  of  this  tutorial  Is  to  present  an  overview  of  the  state-of-the-art  In  real-time  distributed 
processing  as  applied  to  aircraft/avionics.  Definitions  and  concepts  are  presented  starting  with  the 
total  aircraft  as  a  real-time  distributed  computer-controlled  system.  The  relationship  of  aircraft 
mission  and  avionic  system  architectures  Is  discussed.  Overall  system  architectural  considerations  are 
identified  and  their  Impact  upon  a  Real-Time  Distributed  Computer-Controlled  System  is  detailed.  A  top- 
down  hierarchical,  architectural  structure  is  presented.  This  top-down  structuring  is  described  In  terms 
of  the  logical  functional  decomposition  of  the  system  as  follows:  total  aircraft/avionic  system  partition 
Ing  of  airc.raft/avionic  subsystems.  Interconnect  bus  structure  (network),  system-wide  processing  architec¬ 
ture,  subsystems  definition,  and  computer  systems. 

1.0  INTRODUCTION 

By  the  early  1960s,  operational  needs  In  combination  with  the  need  for  on-board  equipment  flexibility  lead 
to  the  introduction  of  general  purpose,  programmable  digital  computers  Into  a  variety  of  aircraft/avionic 
systems.  The  programmability  of  these  machines  permitted  rapid  operational  and  technical  changes  to  be 
made  through  software  modifications  rather  than  through  hardware  changes.  The  advent  of  the  Integrated 
circuit  also  hastened  the  Introduction  of  general-purpose  digital  computers  because  of  the  weight  and 
volume  savings  that  these  electron  devices  had  over  other  competing  technologies.  These  "first  generation 
airborne  computers”  were  termed  “centralized";  that  is,  all  Operational  Flight  Programs  were  contained  In 
the  memory  of  a  single  machine.  Unfortunately,  while  computer  hardware  made  great  strides  forward  In  the 
state-of-the-art  during  this  period  in  time,  the  associated  software  tools  did  not.  Thus,  while  the  use 
of  digital  computers  allowed  the  introduction  of  many  new  operational  capabilities,  management  also  had 
to  live  with  costly,  highly  complex,  and  in  many  Instances,  inefficient  use  of  the  computer  as  an  opera¬ 
tional  resource  due  to  the  (then)  lack  of  quality  software  development  and  support  tools. 

As  the  solid-state  electronics  technology  matured,  and  Its  products  applied  to  militarized  computers,  the 
physical  characteristics  of  the  on-board  computers  decreased  In  value,  which,  in  turn,  led  to  the  avail¬ 
ability  of  a  number  of  light-weight,  lower  cost  computers.  The  availability  of  these  computers  led  to 
their  Incorporation  (physically)  into  various  on-board  subsystems.  Thus,  the  term  "embedded  computers" 
came  about.  And  eventually,  these  machines  were  connected  together  in  what  was  subsequently  termed  a 
"federation"  of  computer  resources. 

As  time  progressed,  the  introduction  of  general-purpose,  programmable  digital  computers  continued  to 
bring  about  quantum  Improvements  in  operational  capabilities  to  military  aircraft.  Unfortunately,  due 
to  the  (then)  lack  of  computer  hardware  standards,  these  machines  were  individually  unique  from  both 
hardware  and  software  support  considerations.  Furthermore,  this  situation  was  exacerbated  by  the  fact 
that  the  solid-state  electronics  industry  continued  to  Introduce  microelectronic  circuits  with  greater 
densities,  higher  speed  performance,  and  myriad  circuit  types  which  made  obsolete  almost  overnight, 
technology  advancements  which  had  not  yet  been  fully  operationally  utilized  in  a  military  environment. 

The  continuation  of  proliferation  of  hardware,  the  absence  of  suitable  standards,  and  the  ever- Increasing 
speed  at  which  new  solid-state  electron  devices  were  being  invented  and/or  created  and  subsequently 
manufactured,  lrd  to  the  establishment  by  the  late  1970s  of  standards  for  computer  hardware  and  related 
higher  order  languages.  As  a  generalization.  It  can  be  stated  that  this  is  the  technical  management 
situation  which  exists  today  in  1981. 

As  we  entered  the  decade  of  the  1980s,  there  were  many  questions  yet  to  be  answered  relative  to  computer 
architecture  and  language  standards.  Specifically,  It  was  postulated  that  the  decade  of  the  1980s  and 
1990s  would  see  the  introduction  of  Real-Time  Computer-Controlled,  Aircraft/Avionic  Distributed  Systems 
containing  several  hundred  microprocessors  Interconnected  by  various  digital  bus  schemes.  These  micro¬ 
processors  would  be  embedded  throughout  the  aircraft  as  computer  resources  which  control  the  operation  of 
a  highly  fault-tolerant,  reconfigurable,  hierarchically  structured  ilrcraft/avionic  system. 

A  major  technical  management  challenge  facing  the  avionics  community  today  is  how  to  transition  from  the 
current  Inventory  of  analog  "black  boxes"  to  one  In  which  by  the  1990s  the  Inventory  will  be  approxi¬ 
mately  90S  digital  in  nature.  The  main  reason  why  this  is  a  major  challenge  to  the  avionics  comnunity 
Is  that  throughout  this  transition  period,  it  Is  Imperative  to  maintain  hardware  interchangeability  and 
not  upset,  nor  negate,  established  hardware  and  software  standards.  Table  1  Identifies  the  key  technical 
characteristics  in  aircraft/avionic  equipments  by  time  frames.  The  time  period  1980  to  1990  itemizes 
characteristics  expected  to  be  the  foundation  for  future  Aircraft/Avionic  Real-Time,  Distributed  Computer- 
Controlled  Systems. 


4-2 


1940  -  60 
~JM5 G~ 

t  Wired  programs 

•  Dedicated  analog 
processors 

•  Integration  through 
pilot  displays 

•  No  redundancy 

•  Limited  fault  tolerance 

•  No  dynamic 
reconfiguration 
capability 

•  Discrete  hardware 


2.0  SYSTEM  ARCHITECTURAL 


TABLE  1 

1960  -  80 
CENTRAL  DI&ITAL 

•  Stored  computer  program 

•  Central  processor(s) 

•  Communication  through 
1/0  Integration  and 
central  processor/stored 
program 

a  Some  degree  of  redundancy 

•  Some  degree  of  fault- 
tolerance 

•  No  dynamic  reconfiguration 
capability 

•  Use  of  MSI  &  LSI  hardware 


1980  -  2000 
DISTRIBUTED  DIGITAL 

•  Distributed  hlerarchial 
stored  program 

•  Redundant  central 
processor(s) 

•  Distributed,  dedicated 
functional  processors 

•  Communication  through  a 
bus  network 

•  Largs  scale  use  of  multi  - 
path  redundancy 

•  Fault- tolerance  and  dynamic 
reconfiguration 

•  VHSIC  hardware 


As  the  avionics  comnunlty  entered  the  decade  of  the  1980s,  there  was  an  absence  of  a  generally  accepted 
system  architectural  approach  to  the  design  and  development  of  on-board  aircraft/avionics  equipments  and 
systems.  In  the  absence  of  a  formal  system  architectural  definition,  a  "Pseudo-Hierarchical  Architectural 
Structuring"  is  proposed  (see  Table  2).  This  concept  is  designated  as  "pseudo"  solely  because  of  the 
current  lack  of  "reduction  to  practice"  (implementation)  of  such  an  approach.  It  should  be  noted, 
however,  that  the  top-down  decomposition  of  the  system  architectural  structure  is  real  from  an  engineering 
design  viewpoint  and  does  indeed  lend  Itself  to  a  logical,  natural  methodology  for  decomposition  of  a 
system  into  its  constituent  parts. 

TABLE  2 

SYSTEM  PSEUPO-HIERARCHIAL 
ARCHITECTURAL  STRUCTURING 

•  Total  aircraft/avionics  system 

•  Partitioning  of  aircraft/avionics  subsystems 

•  Inter-connect  bus  structure 

•  System-wide  processing  architecture 

•  Subsystems  definition 

•  Computer  systems 

The  total  aircraft/avionics  system  is  presented  as  being  at  the  to£  of  the  Pseudo-Hi erarchlal  Architectural 
Structuring  as  shown  in  Table  2.  It  is  presented  as  the  equivalent  of  the  system  mission  for  the  aircraft. 
The  system  mission  Is  presented  for  definition  purposes  as  being  the  operational  functions  performed  by 
the  aircraft  such  as:  fighter,  attack,  Anti-Submarine  Warfare  (ASW),  Airborne  Early  Warning  (AEW),  cargo 
and/or  passenger,  or  Electronic  Warfare  (EW).  It  is  the  system  mission  which  determines  the  types,  capabil¬ 
ities,  functions,  and  performance  of  the  various  aircraft/avionic  electrical  and  electronic  equipment 
on-board  the  aircraft. 


2.1  PARTITIONING  OF  AIRCRAFT/AVIONICS  SUBSYSTEMS 

The  on-board  subsystems  required  for  any  given  aircraft  system  mission  can  be  partitioned  Into  a  number  of 
groups  of  equipments  which  perform  a  general  functional  purpose.  For  example,  the  Vehicle  Group  of  sub¬ 
systems  would  contain  such  equipments  as  the  flight  controls,  pilots'  displays,  and  the  electrical 
generators.  The  Core  Avionics  Group  would  contain  the  coimunlcations,  navigation,  and  the  computational 
resources.  The  Mission/Sensors  Group  would  contain  the  specific  radars,  acoustic  sensors,  or  the 
electronic  warfare  hardware  equipments.  The  Weapons  Group  is  of  course  self-explanatory  as  tc  Its 
contents.  It  should  be  noted  that  these  four  major  partitions  or  groups  of  subsystems  are  "glued" 
together  by  the  System  Architecture,  Integration,  and  Conmon  Hardware  considerations. 

For  the  foreseeable  future.  It  would  appear  that  the  interconnect  bus  structure  will  continue  to  be  based 
upon  the  requirements  of  MIL-STD-1553.  However,  this  statement  is  not  meant  to  imply  that  the  technology 
of  Implementation  will  necessarily  remain  the  same.  It  Is  logical  to  expect  that  as  a  minimum,  a  fiber- 
optics  bus  will  be  fully  Implemented  and  operational  by  1990. 


2.2  SYSTEM-WIDE  PROCESSING  ARCHITECTURAL  ALTERNATIVES 

Figure  1  Is  a  "road  map"  of  the  various  System-Wide  Processing  Architectural  Alternatives  available  to 
designers  and  developers  of  future  aircraft/avionic  systems.  It  would  seem  reasonable  to  assume  that 


4-3 


more  and  more  the  Distributed  or  Federated  Control  approaches  will  be  used  In  future  aircraft  designs, 
while  the  Centralized  Control  approach  would  more  likely  continue  to  appear  in  technology  updates  of 
current  aircraft  systems.  For  definitional  purposes,  System-Wide  Processing  Architectures  are  defined 
by  this  author  as  consisting  of  all  of  the  on-board  embedded  computer  resources:  hardware,  software,  and 
firmware. 

j  stated  previously,  there  Is  an  absence  of  formalized  Industry  and  government  approaches  to  aircraft/ 
avionic  system  architectural  considerations.  Thus,  the  following  definitions  of  various  processor; 
architectures  are  provided  as  working  definitions  only.  That  Is,  they  are  possibly  subject  to  refinement 
and/or  modification.  The  definitions  of  Centralized,  Distributed,  Federated,  and  Hlerarchlal  System-Wide 
Processing  Architectures  are  presented  In  Table  3  in  terms  of  their  key  hardware  and  software 
characteristics. 


TABLE  3 


SYSTEM-WIDE  PROCESSING  ARCHITECTURES  CHARACTERISTICS 
CENTRALIZED 


Hardware  Characteristics 

Powerful  central  computer  (may  use 
multiprocessor  or  redundant  computer) 

All  communication  through  central  unit 

Other  computers  look  like  peripherals 


Software  Characteristics 

Single  complex  executive  resident  in 
central  unit 

Application  programs  cover  all 
avionics  system  functions 

Central  unit  provides  all  systems 
control 


DISTRIBUTED 


•  High  speed  computer/computer  bus 

•  Reconfiguration  not  difficult  if: 

1 .  All  computers  have  same 
architecture 

2.  Multipath  consnunl cation  with 
peripherals 


•  Low  complexity  local  executives  In 
each  computer 

•  Applications  programs  limited  to  local 
functions  in  any  other  computer 

•  No  single  source  for  system  control 
(system  control  distributed  throughout 
local  executives) 


FEDERATED 


•  Hardware  tailored  to  function 

•  Low  data  bus  rate  communication 
bus  treated  like  peripheral 

•  Reconfiguration  difficult 

•  Computers  may  have  different  architectures 


•  Single  executive  resident  In  one  unit 

•  Application  programs  limited  to  local 
functions  in  any  one  computer 

•  One  unit  provides  general  system 
control 


HIERARCHICAL 


•  High  speed  computer/computer  global  bus 

•  Low  speed  local  bus  for  control 

•  Computers  have  same  architecture  but 
tailored  capability 

•  Reconfiguration  not  difficult 


•  Global  bus  system  looks  distributed 

•  Local  bus  systems  look  federated  with 
global  bus  interface  computer  acting 
as  executive 


Figures  2,  3,  4,  and  5  provide  diagrammatic  representations  of  the  four  major  System-Wide  Processing 
Architectures  previously  Identified  in  Table  3.  Based  on  the  working  definitions  given  in  Table  3, 
architectural  options  available  for  consideration  by  military  aircraft/avionic  system  designers  are 
listed  in  Table  4  and  are  diagrammed  In  Figures  6  through  11.  These  options  offer  the  aircraft/avionic 
system  designers  the  latitude  to  maximize  those  characteristics  which  are  of  major  importance  to  the 
particular  aircraft/avionic  system  application. 


TABLE  4 


SYSTEM  ARCHITECTURAL  OPTIONS  APPLIED  TO  MILITARY  NEEDS 


Option 

1: 

Option 

2: 

Option 

3: 

Option 

4: 

Option 

5: 

Option 

6: 

Full  Functional  Redundancy  (Figure  6) 

Full  Functional  Redundancy  Plus  Dedicated  Subsystems  (Figure  7) 

Maximum  Physical  Redundancy  (Figure  8) 

Full  Functional  Redundancy  Within  Local  Group  of  Subsystems  (Figure  9) 
Centralized  (Figure  10) 

Multiprocessor  (Figure  11) 


4-4 


2.3  STANDARDIZATION  -  ARCHITECTURE  INTERACTION 

Figure  12  is  an  attempt  to  visually  demonstrate  the  inter-relationships  between  computer  software  and 
hardware  resources  and  the  System  Architecture,  Integration,  and  Common  Hardware  requirements.  It  is 
hoped  that  the  need  for  simultaneous  consideration  of  all  of  these  factors  by  system  designers  can  be 
explicitly  visualized  from  the  structure  of  the  matrix. 

In  Figure  12,  the  FOUNDATION  for  the  entire  system  is  shown  as  the  horizontal  bar  entitled  "System 
Architectures".  Being  that  It  Is  a  foundation,  it  cuts  across  each  of  the  vertical  bars  which  are  meant 
to  convey  the  Idea  that  the  "Missions"  are  Independent,  separable,  and  unique  to  each  operational  mission 
need.  Contained  within  this  concept  of  the  System  Architecture  as  the  foundation  upon  which  all  the 
operational  systems  are  built,  is  the  premise  that  any  Item  Identified  within  the  block  has  general 
applicability  to  all  military  aircraft  systems  (when  required). 

The  horizontal  bars  listed  under  "Common  Functions"  are  used  to  indicate  equipments  or  software  which  cut 
across  various  Missions,  but  are  uniquely  tailored  to  the  particular  operational  application.  For 
example,  signal  processors  and  their  associated  software  programs  are  used  in  many  Naval  aircraft; 
however.  It  Is  only  for  the  Anti-Submarine  Warfare  (ASW)  Mission  that  the  processor  and  its  associated 
software  are  tailored  for  the  acoustic  processing  role.  In  like  fashion,  the  aircraft  displays  may  have 
some  Identical  hardware  and  software  used  across  all  aircraft,  but.  again,  any  one  particular  combination 
of  controls  and  displays  Is  unique  to  each  Mission  application. 

2.4  DISTRIBUTED  EMBEDDED  COMPUTATIONAL  RESOURCES 

A  key  Indicator  of  the  degree  of  distribution  of  embedded  computational  resources  within  an  aircraft/ 
avionic  system  architecture  is  the  total  number  of  microprocessors  used  within  the  on-board  equipments. 

Shown  in  Table  5  are  projected  number  counts  for  "futuristic"  Airborne  Early  Warning  (AEW)  and  an  Anti- 
Submarine  Warfare  (ASW)  aircraft.  The  Information  contained  in  this  chart  was  prepared  by  a  major 
supplier  of  navy  aircraft,  and  as  such  represents,  in  the  author's  opinion,  a  fairly  realistic  projection 
of  the  quantities  of  microprocessors  that  will  be  used  as  on-board  embedded  computer  resources  with  the 
next  generation  of  naval  aircraft.  Worthy  of  particular  note  is  the  fact  that  the  count  difference 
between  the  two  aircraft  operational  applications  lies  in  the  area  of  Mission  Avionics  rather  than  in  the 
Core  or  Vehicle  Systems  areas. 

TABLE  5 

TOTAL  SYSTEM  MICROPROCESSOR  COUNT 

AEW  APPLICATION  ASW 

(137  Microprocessors)  (141  Microprocessors) 

Mission  Avionics  -  24  Mission  Avionics  -  28 

Core  Aircraft  Systems  -  16  Core  Aircraft  Systems  -  16 

Core  Avionics  -  97  Core  Avionics  -  97 

Table  6  Itemizes  the  number  of  reprogrammable  and  fixed  program  microprocessors  projected  for  certain 
types  of  functional  avionic  equipments.  This  chart  was  prepared  by  a  firm  currently  engaged  In  providing 
similar  equipment  for  operational  use.  And  again,  as  with  the  statement  made  relative  to  the  Information 
contained  in  Tahfle  5,  it  reflects  more  than  a  reasonable  degree  cf  engineering  certainty  as  to  the 
validity  of  the  estimates  shown. 

TACLE  6 

SELECTED  AVIONICS  SUBSYSTEMS  MICROPROCESSOR  COUNT 


Number  of  Reprogrammable  Number  of  Fixed  Program 


Function 

Microprocessors 

Microprocessors 

Total 

Data  System  and  Displays 
(Core  Mission) 

25 

64 

89 

Core  Sensors  &  Conditioning 

16 

26 

42 

Acoustic  Signal  Processing 

8 

12 

20 

Radar  Signal  Processing 

6 

6 

12 

Other  ASW  Sensors  &  Conditioning 

9 

26 

35 

TOTAL 

64 

134 

198 

3.0  FUTURE  TECHNOLOGY  CONSIDERATIONS 


There  are  a  number  of  considerations  which  must  be  taken  into  account  relative  to  the  transfer  and 
insertion  of  new  technologies  Into  future  Real-Time  Aircraft/Avionic  Distributed  Computer  Control  Systems. 
First  of  all,  the  Real-Time,  Computer-Controlled,  Distributed  System  of  the  future  will  require  that  the 
system  conceptual  and  definition  phase  of  each  future  aircraft  program  consider  the  inter-relationships 
of  factors  such  as:  support/tools  software,  applications  software,  firmware,  computer-aided  design, 
test  and  manufacturing  software,  processing  system  architecture  software,  and  simulation,  test  and 
diagnostics  software. 


4-5 


Secondly,  system  designers  end  developers  must  take  Into  consideration  the  technology  direction*  listed 
In  Table  7. 

TABLE  7 

TECHNOLOGY  DIRECTIONS 

•  Software  function  taken-over  by  firmware  in  near-term 

•  VHS1C  chips  take  over  software  functions  In  the  long-term 
a  Emergence  of  hardware  macros  as  basic  building  blocks 

•  Signal  processing  as  dominant  thrust 

•  More  "Modular"  software 

•  New  systems  will  be  fault-tolerant,  redundent,  reconfigurable 

•  Emergence  of  the  "Smart",  reconfigurable  memory  system 

•  Greater  thrust  for  extracting  data  from  aircraft/avionic  systems. 

Lastly,  the  systems  designers  and  developers  must  consider  the  Items  listed  In  Table  8.  These  Items 
represent  the  author's  beet  judgment  as  to  the  challenges  to  be  faced  by  the  management  and  engineering 
staffs  both  In  government  and  In  Industry  Involved  In  the  system  conception,  definition,  design,  develop¬ 
ment,  test  and  evaluation,  and  subsequent  logistic  support  of  future  Real-Time,  Computer-Controlled 
Distributed  Systems  for  aircraft/avionic  applications. 

TABLE  8 

CHALLENGES  TO  BE  FACED 


•  Amount  of  embedding  Into  the  system  architecture 

•  Systems  engineers  not  computer  specialist/engineers  performing  the  design  function 

•  Primary  failures  will  be  at  the  system  level  not  at  the  component  level 

•  Lack  of  economic  leverage 

•  Rapidity  of  change  In  the  microprocessor  state-of-the-art 

•  Fixed  function  vs.  programnable  microprocessors 

•  Lack  of  precise  definitions  throughout  the  field. 

4.0  CONCLUSION 

If  tl'-.re  Is  any  one  conclusion  which  can  be  reached  In  trying  to  understand  the  attributes  of  Real-Time 
Alrcruft/Avlonlc  Distributed  Computer  Control  Systems  It  would  have  to  he.  In  the  opinion  of  this  author, 
that  system  designers  and  developers  can  no  longer  build  such  systems  from  the  "bottom-up",  black  box 
approach.  A  partnership  between  the  technical  managers,  the  system  designers,  and  the  various  technolo¬ 
gists  is  required  If  future  systems  are  to  be  developed  >'Uh  minimum  proliferation  of  the  embedded 
computer  resources,  minimum  logistics  for  both  the  avionics  hardware  and  the  software,  and  maximum 
availability  in  the  operational  environment. 


ARCHITECTURE 

ALTERNATIVES 


Figure  1  System-Wide  Processing  Architectural  Alternatives 


I 


si  ._*.k 


Figure  2  Centralized  Architecture 


4? 


COMPUTER/COMPUTER  BUS 


Figure  3  Distributed  Architecture 


1 


CONTROL BUS 


Figure  4  Federated  Architecture 


GLOBAL  BUS 


LOCAL  BUS 


[_□ 


) 

COMPUTER 

COMPUTER  j 

COMPUTER 

COMPUitR 


vT\vT\ 


COMPUTER 


COMPUTER 


JTER 


i 


LOCAL  BUS 


» 

COMPUTER 


yK 


i 


COMPUTER 


PERIPHERALS  PERIPHERALS  PERIPHERALS  PERIPHERALS  PERIPHERALS 


Figure  5  HelrarchlcaT  Architecture 


RADAR 


\ 


RADAR 


I 


PROC/ 

CTRL 


OPTION  2  | 

FULL  FUNCTIONAL 

REDUNDANCY 

PLUS 

DEDICATED  SUBSYSTEM 


FIGURE  6  -  OPTION  1 


I 


FIGURE  7  -  OPTION  2 


FIGURE  9  -  OPTION  4 


4-n 


I 


STANDARDIZATION  -  ARCHITECTURE 
INTERACTION  MATRIX 

MISSIONS 


SI-1 


DISCUSSIONS 
SESSION  I 


REFERENCE  NO.  OF  PAPER;  1-1 
DISCUSSOR'S  NAME:  Dr.  von  Issendorff 
AUTHOR'S  NAME:  Enslow  (Llvesey,  presenter) 

COMMENT:  You  mentioned  that  there  are  no  suitable  programming  languages  for  distributed  systems  so 
far.  Among  others  there  are  CSP  and  PUTS  from  Feldnan  or  ADA.  Could  you  please  comment  on  why  these 
languages  are  not  sufficient? 

AUTHOR'S  REPLY:  Sufficient  for  what?  These  languages  do,  of  course,  allow  us  to  write  distributed  or 
concurrent  programs,  but  this  Is  only  10  percent  of  the  problem.  We  need  active  programing 
environments  Including  program  specification,  design,  verification  and  debug  tools  for  distributed 
systems  (test  tools,  too).  These  problems  are  especially  difficult  In  a  distributed  system.  (This  Is 
the  opinion  of  the  presenter,  and  not  necessarily  that  of  the  author.) 


REFERENCE  NO.  OF  PAPER:  1-1 

DISCUSSOR'S  NAME:  Erwin  C.  Gangl ,  WPAFB,  Ohio 

AUTHOR'S  NAME:  Enslow  (Llvesey) 

COMMENT:  In  the  application  of  distributed  systems,  flight  safety  concerns  are  reliability  of  hardware 
and  software  performance  and  guaranteed  real-time  response.  This  can  be  obtained  by  maturing  the 
software  through  extensive  use  and  correcting  the  bugs.  We  cannot  use  this  approach  In  flight  safety 
systems  since  we  have  to  have  reliable  software  prior  to  first  flight.  Therefore,  we  have  to 
accomplish  this  through  exhaustive  testing.  How  can  we  get  reliable  software  by  testing  since  In 
distributed  systems  It  Is  Impossible  to  predict  and  exercise  alj  possible  states  of  the  system? 

AUTHOR'S  REPLY:  This  Is  also  true  for  centralized  systems  and  Is  not  a  special  problem  of  fully 
distributed  processing  systems.  I  do  not  know  of  any  "magic"  solutions,  but  rather  see  the  continued 
use  of  top-down  design,  verification,  automatically  generated  test  cases,  extensive  simulation  and 
perhaps  new  tools  such  as  IPC  control  languages.  (This  Is  the  opinion  of  the  presenter  and  not 
necessarily  tnat  of  the  author.) 


REFERENCE  NO.  OF  PAPER:  1-1 
DISCUSSOR'S  NAME:  B.  A.  Zempollch 
AUTHOR'S  NAME:  Enslow  (Llvesey) 

COMMENT:  Do  you  distinguish  between  ADA  as  a  programing  language  and  software  development  tools,  such 
as  Hardware  Description  Languages? 

AUTHOR'S  REPLY:  I'm  not  an  ADA  expert.  However,  the  direction  I  see  ADA  going  Is  that  users  of  ADA 
will  subset  It  ar,d  that  subset  will  look  a  lot  like  PASCAL.  Another  group  of  programmers  will  be 
trained  primarily  In  the  use  of  tasking  facilities.  Other  programmers  will  spend  most  of  their  time  on 
developing  more  advanced  debugging,  testing,  and  specification  tools  to  fit  around  ADA— the  so-called 
ADA  environment.  I  expect  the  most  exciting  work  to  be  done  In  the  environment  rather  than  language 
development  Itself.  We  have  enough  programming  languages.  What  Is  needed  are  the  tools  to  enable 
people  to  use  them. 


REFERENCE  NO.  OF  PAPER:  1-1 
DISCUSSOR'S  NAME:  Dr.  van  Keuk 
AUTHOR'S  NAME:  Enslow  (Llvesey) 

COMMENT:  Would  you  say  that  It  will  remain  to  be  sensible  to  think  about  distributed  processing  with¬ 
out  addressing  a  precise,  limited,  and  well-analyzed  case  of  application  being  In  mind?  The  software 
and  har&iare  structure  will  often  be  dictated  by  the  particular  kind  of  application  to  a  high  degree. 

AUTHOR'S  REPLY:  I  think  that  both  jobs  are  needed:  (1)  Basic  research  Into  abstract  distribution 
systems  without  the  constraints  of  a  particular  application,  and  (2)  applied  research  Into  the 
performance  and  other  special  constraints  of  particular  problems.  Either  will  be  much  less  useful 
without  the  other.  (This  Is  the  opinion  of  the  presenter,  not  necessarily  that  of  the  author.) 


SI-2 


REFERENCE  NO.  OF  PAPER:  1-3 
OISCUSSOR'S  NAME:  G.  Scottl .  SELENIA 
AUTHOR'S  NAME:  J.  T.  Martin 

COMMENT:  I  feel  that  there  are  several  other  reasons  to  explain  why  the  U.K.  wrote  the  0S0O18.  Can 
you  please  state  the  differences  between  1553B  and  the  0018  Standard? 

AUTHOR'S  REPLY:  The  U.K.  decided  to  produce  Defence  Standard  00-18  (Part  2)  because  15538  was  seen  to 
be  of  such  great  use  In  so  many  applications  that  It  was  felt  that  It  should  be  possible  to  specify  the 
bus  using  a  U.K.  standard  rather  than  by  keep  referring  to  a  U.S.  standard.  The  U.K.  Defence  Standard 
00-18  (Part  2)  Is  absolutely  technically  Identical  to  MIL- STD-15538.  The  differences  In  format  and 
language  used  In  Oef.  Stan.  Oo-lO  (Part  2)  come  about  purely  and  simply  because  the  U.K.  Authority  for 
Defence  Standards  has  certain  rules  which  apply  to  the  format  and  language  used  In  a  U.K.  Defence 
Standard. 


Oust  to  reinforce  and  confirm: 

Def.  Stan.  00-18  (Part  2)  Is  technically  Identical  to  MIL-STD-1553B.  Units  built  to  either 
standard  will  be  Just  as  compatible  with  units  built  to  the  other  standard  as  if  they  had  all  been 
built  to  the  same  standard. 


It  may,  however,  be  Interesting  to  note  that  there  are  more  things  In  MIL-STD-1553B,  and  hence  In 
Def.  Stan.  00-18  (Part  2),  that  are  not  completely  specified.  For  Instance,  although  some  responses 
are  defined  as  legal  and  some  responses  are  defined  as  Illegal  there  are  still  some  responses  which 
fall  between  the  two  definitions  and  the  action  to  be  undertaken  In  the  event  of  receiving  such  a 
response  Is  therefore  not  clearly  defined. 


In  an  attempt  to  promote  as  much  standardization  as  possible  the  U.K.  has,  therefore,  produced 
defined  actions  to  be  followed  In  the  event  that  such  a  response  Is  received.  These  U.K.  preferred 
responses  are  documented  In  the  guide  to  Def.  Stan.  00-18  (Part  2).  This  guide  has  the  reference  Def. 
Stan.  00-18  (Part  1).  The  Important  difference  Is  that  whereas  the  requirements  of  Def.  Stan.  00-18 
(Part  2)  are  mandatory,  the  further  Information  In  Def.  Stan.  00-18  (Part  1)  Is  only  advisory. 


REFERENCE  NO.  OF  PAPER:  1-3 
OISCUSSOR'S  NAME:  H.  Vhltehouse.  USN 
AUTHOR'S  NAME:  J.  T.  Martin 

COMMENT:  Would  you  conment  on  the  properties  of  an  avionics  bus  which  are  not  provided  by  standard 
commercial  buses  such  as  the  HP1B  or  Its  IEEE  counterpart. 

AUTHOR'S  REPLY:  MIL-STD-1553B  has  come  about  not  just  In  order  to  redesign  the  wheel  but  because  none 
of  the  commercial  buses  available  at  the  time  were  satisfactory  for  the  application.  The  requirements 
for  a  commercial  Interface  include:  high-speed  capability  (therefore,  fast  logic  edges  or  parallel 
Interface)  and  cost-effectiveness,  bearing  In  mind  the  environment  that  the  Interface  Is  to  operate 
In.  The  requirements  for  an  avionic  bus  Include:  EMC  compatabll Ity  (therefore,  controlled  logic 
edges),  low  wiring  density  (to  reduce  weight  and  volume)  and  reliability  and  availability  leading 
usually  to  dual  bus  configuration  (making  the  use  of  serial  transmission  technlgues  even  more 
Important). 

The  above  Is  a  very  brief  summary  of  the  reasons  for  MIL-STD-1553B.  For  a  full  account  see  MIL- 
STD-15538  Handbook  or/and  Defence  Standard  00-18  (Part  1),  the  handbook  and  explanation  for  Defence 
Standard  00-18  (Part  2). 


REFERENCE  NO.  OF  PAPER:  1-3 

DISCUSSOR'S  NAME:  CDR  0.  A.  Strada,  ONR,  London 

AUTHOR'S  NAME,  J.  T.  Martin 

COMENT:  How  do  you  see  the  role  of  distributed  processing  In  reducing  crew  workload  and  dealing  with 
the  multi sensor  environment  in  an  ASW  aircraft  like  the  P3C  or  Nimrod’ 

AUTHOR'S  REPLY:  Distributed  processing  does  not  really  effect  cre»  workload  as  such.  The  crew  should 
be  unaware  of  what  the  design  of  the  system  that  they  are  using  Is.  The  main  Item  to  effect  crew 
workload  Is  the  design  of  the  man-machine  Interface,  this  Includes,  of  course,  the  keyboards,  the 
displays  and  the  processing  which  allows  these  keyboards  and  displays  to  function. 

H-  '  -  said  that  It  is  true  that  a  distributed  processing  system  whereby  the  various  subsystems  of 
the  s)„  are  connected  together  by,  for  Instance,  a  1553B  bus  does  lend  Itself  to  the  combining  of 

Information  Into  one  place  and  the  control  of  a  number  of  systems  from  one  place.  Although  the  same 
effect,  as  far  as  the  crew  Is  concerned,  could  be  achieved  without  such  a  distributed  system,  I  believe 
that  you  would  have  to  pay  a  high  hardware  overhead,  for  Instance  many  extra  I/O  control  channels  from 
the  centralized  system,  to  produce  the  same  degree  of  centralization  of  controls  and  dlsplayr. 


REFERENCE  NO.  OF  PAPER:  1-3 


OISCUSSOR'S  NAME:  Or.  A.  A.  Cal  away,  RAE 
AUTHOR'S  NAME:  J.  T.  Martin 

COMMENT:  Mr.  Martin  has  talked  about  the  opportunities  for  using  distributed  processing  In  modern 
systems.  There  are  many  constraints  which  can  be  applied  In  distributing  processing  -  such  as  minimum 
data  flow,  retaining  comparable  processing  sizes,  etc.  In  practice,  however,  because  of  the  way 
systems  are  procured,  and  the  accountability  of  manufacturers,  does  the  author,  as  a  representative  of 
an  Industrial  systems  company,  sec  us  ever  achieving  anything  other  than  functional  distribution  as  a 
practical  criterion? 

AUTHOR'S  REPLY:  The  problem  Is  to  fully  specify  the  requirements  to  be  placed  on  the  supplier  and  to 
be  able  to  specify  the  tests  necessary  to  prove  that  the  supplied  Item  exhibits  the  attributes  which 
are  demanded.  Very  few  manufacturers  actually  manufacture  all  Items  of  the  subsystems  or  system  that 
they  supply  (for  Instance  a  sensor  head  may  be  purchased  by  a  system  supplier  to  be  added  Into  his 
total  system  or  subsystem  by  way  of  a  subcontract  or  another  supplier).  For  these  Items  of  subcontract 
to  be  procured  and  accepted  It  must  be  possible  to  specify  them  and  test  them  to  that  specification. 

If  It  Is  possible  for  one  main  or  prime  supplier  of  a  system  to  specify  such  a  subcontracted  Item,  then 
It  must  be  possible  for  some  other  supplier  or  procurement  agency  to  also  produce  the  necessary 
specification.  He  could  therefore  Imagine  the  case  where  a  system  design  Is  carried  out  by  one  firm  to 
the  level  necessary  for  the  equipment  and  subsystems  specifications  to  be  produced  using  as  a  criteria 
for  the  division  of  the  work  any  split  required  as  long  as  It  leads  to  the  required  specifications  and 
test  specifications. 

Summarizing  -  technically  any  split  Is  possible  and  already  achieved.  Managerlally,  especially  In  the 
case  of  the  procurement  agencies.  It  may  be  necessary  to  reconsider  our  present  working  practices. 


REFERENCE  NO.  OF  PAPER:  1-4 

DISCUSSOR'S  NAME:  CDR  J.  A.  Strada,  USN,  ONR-london 
AUTHOR'S  NAME:  B.  A.  Zempollch 

COMMENT:  Reference  the  pc;1t1on  of  "Systems  Architect"  In  NAVAIR.  For  whom  would  such  an  Individual 
work  during  aircraft  development?  Would  he  stay  with  the  aircraft  as  It  moves  Into  an  operation 
status?  For  whom  would  he  work  then? 

AUTHOR'S  REP..Y:  (1)  The  PMA  and  his  administrative  division. 

(2)  Yes,  he/she  would  stay  with  the  aircraft. 

(3)  Continue  to  work  for  the  PMA. 


6-1 


PERFORMANCE  STUDY  OF  A  DISTRIBUTED  MICROPROCESSOR  ARCHITECTURE 
FOR  USE  ABOARD  MILITARY  AIRCRAFT 


Kang  G.  Shin  and  C.  M.  Krishna 
Electrical,  Computer,  and  Systems  Engineering  Department 
Rensselaer  Polytechnic  Institute 
Troy,  New  York  12181 


ABSTRACT 


An  analysis  of  the  performance  of  the  Distributed  Microprocessor  Airborne  Computing  System  (DMACS)  developed 
at  Rensselaer  Polytechnic  Institute  is  presented.  The  DMACS  consists  of  a  number  of  quasi-independent  com¬ 
puter  subsystems  loosely  coupled  in  a  highly  decentralised  structure  that  yet  exhibits  high  cogency  as  a 
system.  Some  Important  parameters  in  the  system  such  as  job  scheduling  and  starting  delays,  bus  access  delay 
and  system  reliability  are  studied. 

In  order  to  highlight  the  implications  of  the  design  options  chosen,  the  structure  of  the  DMACS  and  that  of 
the  Draper  Laboratory's  Fault-Tolerant  Multiprocessor  (FTKP)  system  are  compared  and  the  Impact  of  structure 
on  performance  is  discussed  qualitatively. 

1.  INTRODUCTION 

The  increasing  sophist icat ion  of  fighter  aircraft  has  raised  the  need  for  intelligent  control  equipment. 

All  too  often  at  present,  this  equipment  has  been  added  in  an  ad-hoc  fashion.  The  result  has  been  a 
variety  of  Independent  systems  for  such  functions  as  fire  control,  flight  control,  navigation,  etc.  This 
leads  to  wasteful  redundancy,  to  relatively  low  system  reliability  and  a  high  workload  upon  the  pilot  (who 
is  the  coordinating  agency).  This  is  the  main  motivation  for  a  new  concept  called  Integrated  Control 
(Robinson,  A.C.,  and  Hitt,  E.  F. ,  December  1978;  Shin,  K.G. ,  December  1979).  Integrated  Control  (IC)  is 
the  use  of  control  equipment  as  part  of  an  organized  and  cogent  system.  Integrated  Control  treats  the 
entire  aircraft  (the  pilot  included)  as  an  organic  whole.  Considerable  benefits  follow.  For  one  thing, 
equipment  redundancy  translates  more  efficiently  to  fault-tolerance.  For  another,  the  pilot  —  while  still 
the  coordinating  agency  —  la  no  longer  involved  in  step-by-step  and  detailed  low  level  control.  Instead, 
he  la  largely  the  decider  of  policy,  choosing  from  a  .lumber  of  options  open  to  him  and  letting  the  system  do 
the  rest. 

Needless  to  say.  Integrated  Control  requires  a  sophisticated  computer  system  that  is  highly  reliable  and  is 
capable  of  absorbing  with  equanimity  the  large  surges  of  throughput  demand  that  are  characteristic  of  the 
application  at  hand. 

A  number  of  attempts  have  been  made  to  design  highly  reliable  systems.  These  include  the  Software  Imple¬ 
mented  Fault  Tolerance  (SIFT)  machine  of  SRI  International  (Wensley,  J.H. ,  et  al.,  Ottober  1978),  The  Multi- 

Microproceasor  Flight  Control  System  (M^CS)  program  of  the  Air  Force  Flight  Dynamics  Laboratory  (AFFDL)  and 
Honeywell  (White,  J.A. ,  et  al.,  October  1979)  and  the  Fault  Tolerant  Multiprocessor  (FTMP)  of  the  Charles 
Stark  Draper  Laboratory  (Hopkins,  A.L.,  et  al.,  October  1978).  The  last  of  these  is  an  especially  interesting 
design  and  is  the  result  of  certain  well-defined  design  choices. 

In  a  project  recently  initiated  by  the  authors  at  Rensselaer  Polytechnic  Institute,  an  attempt  has  been  made 
to  design  a  high-reliability  and  high-throughput  machine  with  extensive  decentralization  of  control  (Shin,  K.G., 
and  Krishna,  C.M. ,  December  1980).  It  has  been  sought  to  use  the  extended  capabilities  of  recently  developed 
microprocessors  such  as  the  Motorola  t8000  and  the  AMD  2903.  Delegation  of  control  has  been  maximized.  The 
system  in  entirely  asynchronous  and  highly  modular.  Use  has  been  made  here  of  the  essential  characteristics 
of  the  application.  In  the  first  place,  the  aircraft  mission  can  be  rather  neatly  divided  into  nearly  in¬ 
dependent  portions.  This  decomposition  of  the  mission  into  its  component  parts  is  formalized  in  the  concept 
of  atom  functions.  Again,  the  nature  of  the  application  suggests  a  system  dichotomy.  TMb  translated  into 
the  way  the  architecture  is  composed:  we  have  a  central  area  and  a  peripheral  area,  each  with  its  own  dis¬ 
tinctive  characteristics.  The  peripheral  area  is  dedicated  to  particular  tasks  such  as  data-taklng  and 
actuator-driving,  whereas  the  central  area  is  in  a  symmetric  formation  and  is  therefore  not  dedicated  to  any 
particular  task.  This  has  obvious  implications  for  reliability  —  both  the  actual  system  reliability  and 
the  ease  with  which  theoretical  predictions  concerning  reliability  may  be  made. 


Also,  the  present  architecture  has  been  explicitly  based  on  the  concept  of  Integrated  Control.  This  implies 
that  it  haB  been  attempted  to  attack  the  aircraft  control  problem  holistically  and  from  a  systems  point  of 
view.  This  is  a  departure  fro®  other  distributed  systems  in  that  these  have  generally  tended  to  consider  only 
the  computing  equipment  without  much  consideration  being  given  to  the  operating  environment. 

This  paper  is  organized  as  follows.  Section  2  consists  of  an  overview  of  the  system  architecture.  This  is 
abridged  from  an  earlier  publication  (Shin,  K.G. ,  &  Krishna,  C.M. ,  December  X98 0)  and  is  presented  here  for 
completeness.  Section  3  focuses  on  the  central  controller.  The  nature  of  the  controller's  functions  has  a 
decisive  Impact  on  system  performance.  Section  4  deals  with  the  performance  evaluation  of  the  system.  A  com¬ 
parison  with  the  Draper  Laboratory’s  FTMP  is  provided  in  Section  5  and  the  paper  concludes  with  Section  6. 

2.  REVIEW  OF  DMACS  ARCHITECTURE 

The  DMACS  architecture  is  based  on  both  mission  decomposition  and  system  dichotomy.  The  architecture  has 
been  described  In  some  detail -in  (Shin,  K.G. ,  and  Krishna,  C.M.,  December  1980),  but  for  convenience,  the 
major  aspects  are  briefly  described  below. 


6-2 


2.1  Mission  Decomposition 

The  ordered  set  of  tasks  to  be  performed  by  an  airborne  computer  system  is  termed  a  mission.  A  mission  con¬ 
sists  of  mission  segments  such  as  takeoff,  cruise,  target  tracking,  landing, etc.  Each  mission  segment  is 
then  divided  into  basic  mission  components  called  atom  functions.  An  atom  function  may  be  considered  a  unit 
progran  performing  a  basic  unit  of  the  mission  such  as  Kalmsn  filteiing,  control  law  calculation,  etc. 

Each  atom  function  receives  raw  data  from  its  source  set  (of  sensors,  pilot-activated  systems,  and  ground 
communication  systems)  and  feeds  a  sink  set  (actuators  and  the  cockpit  display)  with  processed  data.  In 
view  of  the  ever-increasing  computation  power  of  microprocessors  It  is  not  unreasonable  to  assume  that  any 
atom  function  can  be  handled  by  a  single  advanced  microprocessor  (e.g.  M68000,  LSI-11/23,  AMD  2900  series) 
within  the  Imposed  time  limit.  This  assumption  together  with  the  mission  decomposition  offers  system  mod¬ 
ularity  in  both  hardware  and  software,  thereby  enabling  an  atom  function  to  be  a  hardware  and  software  build¬ 
ing  block  in  the  DMACS. 

2.2  System  Architecture 

The  input  to  the  system  is  derived  from  sensors,  ground  communications  and  pilot-generated  inputs.  The 
rate  of  data  flow  is  small  —  only  a  few  Hertz.  The  outputs  of  the  system  are  to  mechanical  actuators  and 
to  the  cockpit  display.  These,  again,  are  at  low  data  rates.  In  contrast,  the  computations  themselves  are 
generally  Involved  and  are  required  to  be  carried  out  at  high  speed. 

Clearly,  the  processors  handling  the  dat.  formatting  tasks  from  the  individual  sensors  have  to  be  dedicated. 
The  processors  carrying  out  the  bulk  of  the  processing  do  not  have  to  be  so  dedicated. 

By  means  of  arguments  similar  to  the  above,  it  is  possible  to  show  that  the  application  calls  for  a. system 
dichotomy.  Such  a  dichotomy  is  indeed  built  into  the  system  and  represented  by  the  peripheral  and  central 
areas  (Figure  1).  The  peripheral  area  consists  of  the  sensors,  actuators  and  associated  (dedicated)  pro¬ 
cessing  equipment.  This  equipment  is  relatively  low-capability  hardware.  The  central  area  consists  of 
high  performance  Processing  Modules  (PM's).  Each  PM  consists  of  a  main  processor  with  its  own  private  mem¬ 
ory  and  two  bus  controllers  to  interface  with  the  data  and  control  serial  bus  sets.  These  are  the  only  buses 
in  the  system  and  are  triply  redundant.  The  basic  system  architecture  is  depicted  in  Figure  2  and  the  pe¬ 
ripheral  area  is  shown  in  some  detail  in  Figure  3. 

3.  MORE  ON  THE  DMACS  ARCHITECTURE 

The  Central  Controller  (CC)  is  the  top  coordinating  agency  after  the  pilot  and  has  a  decisive  impact  on  sys¬ 
tem  performance.  Prior  to  performance  analysis,  therefore,  it  is  in  order  to  discuss  the  CC  along  with 
architectural  implications. 

3.1  Central  Controller 

The  CC  is  at  the  heart  of  the  DMACS  and  operates  in  two  different  modes;  the  normal  and  abnormal  modes.  The 
extent  to  which  the  architecture  has  been  decentralized  results  in  a  light  controller  loading  under  normal 
conditions.  The  system  can  be  thought  of  as  being  a  set  of  quasi-independent  computer  subsystems,  each  mem¬ 
ber  of  the  set  being  formally  complete  within  itself  under  most  normal  conditions  of  operation.  However, 
the  system  may  behave  like  a  centralized  computer  under  abnormal  conditions  (e.g.  change  of  mission  profile). 

A.  Normal  Mode  of  Operation 

The  central  controller  has,  under  normal  operating  conditions,  to  carry  out  periodic  error  checks  and  to 
control  the  allocation  of  the  major  common  resource  in  the  system  —  the  data  bus.  Data  bus  grant  is  re¬ 
quested  and  granted  asynchronously  according  to  a  quasi-handshake  format.  The  main  processor  within  the 
processing  module  places  the  data  word  tr  be  broadcast  in  the  data  bus  controller.  Bus  grant  requests  are 
entirely  within  the  domain  of  the  two  bus  controllers  —  insofar  as  the  main  processor  in  the  processing 
module  is  concerned,  the  bus  controllers  represent  its  only  contact  with  the  outside  world. 

The  data  bus  controller  signals  the  control  bus  controller  (CBC)  that  a  data  word  is  available  for  broadcast. 
The  CBC  responds  by  setting  the  data  bus  grant  request  bit  in  its  transactions  register.  The  transactions 
register  is  periodically  polled  by  the  central  controller  and  bus  grant  is  achieved  on  a  first-come  first- 
served  basis. 

B.  Abnormal  Mode  of  Operation 

Central  controller  intervention  on  a  large  scale  is  called  for  when  abnormal  events  occur.  These  may  call 
for  a  redistribution  und/or  redefinition  of  system  resources.  The  following  are  the  most  commonly  encoun¬ 
tered  abnormal  occurrences: 


#  Malfunctions  in  PM's 

•  Mission  profile  changes 

e  Test  requests  from  the  peripheral  area. 

To  handle  these  occurrences,  the  CC  needs  precise,  accurate  and  timely  information  on  the  status  and  duty 
of  each  processor  in  the  system.  The  principal  table  of  information  held  within  the  central  controller  is 
the  Central  Cluster  Status  Table  (CCST).  The  CCST  has  the  following  format: 


6-3 


Atom 

function 

ACTIVE/ 

PASSIVE 

PROCESSORS 

ASSIGNED 

PROCESSOR 

STATUS 

The  atom  functions  are  ordered  according  to  their  importance.  Processors  not  assigned  to  any  atom  functions 
(i.e.  free  PM's)  are  listed  as  being  assigned  to  atom  function  0  (i.e.  the  lowest  priority  atom  fund”). 
Processor  Modules  that  are  malfunctioning  are  assigned  an  atom  function  number  one  higher  than  the  most  criti¬ 
cal  function  of  all  —  the  control  function. 


The  active/ passive  column  indicates  whether  or  not  the  concerned  atom  function  is  active  within  the  current 
mission  profile.  (Note  that  all  atom  functions  possible  are  listed:  not  just  those  that  are  currently  active. 
This  does  not  cause  a  time  penalty  for  table-search  during  reallocation  due  to  the  way  the  table  is  structured.) 
Inactive  functions  generally  do  not  have  any  PM's  assigned  to  th'mi. 

When  a  PM  malfunction  is  reported,  the  central  controller  scans  the  CCST  from  the  bottom.  If  —  as  is 
generally  the  case  —  there  is  a  free  PM  available,  that  PM  is  brought  into  the  depleted  triad. 

In  the  event  that  there  is  no  free  PM  available,  the  least  critical  atom  function  is  retired  and  the  PM's 
assigned  to  it  are  used  as  spares. 

The  process  of  triad  reconfiguration  is  as  follows: 


1.  Delink  the  injured  processor. 

2.  Find  a  replacement  PM. 

3.  Load  status. 

4.  Check  status  and  induct  into  the  system. 


Of  these  steps,  steps  1,  2  and  4  are  controller-intensive,  i.e.  they  require  extensive  controller  involvement 
Step  3  on  the  other  hand  is  handled  without  much  reference  to  the  controller.  Transfer  of  software  is  carried 
out  by  DMA. 


A  slightly  more  complicated  process  is  involved  when  triad  reconfiguration  is  called  for.  The  loading  upon 
the  controller  is  far  greater  than  in  the  case  of  a  random  processor  knockout  (i.e.  the  random  demise  of  a  PM). 
Also,  the  volume  of  software  to  be  transferred  is  far  greater.  The  latter  reason  is  the  more  straightforward 
to  handle:  the  time  required  on  a  bus  for  software  transfer  1b  very  nearly  proportional  to  the  volume  of  soft¬ 
ware,  while  the  controller  loading  is  more  difficult  to  quantify  precisely. 

The  effect  of  controller  loading  upon  the  system  is  minimized  by  carrying  out  the  reconfiguration  in  stages, 
configuring  the  most  important  triads  firs',  so  that  the  more  critical  new  functions  are  activated  as  soon  as 
possible.  Note  that  functions  such  as  flight  control  are  active  throughout  and  are  not  affected  by  recon¬ 
figuration  except  to  handle  malfunctions. 

3.2  Implications  of  the  Architecture 

There  are  some  particular  aspects  of  the  application  we  are  concerned  with  and  some  points  in  the  architec¬ 
ture  here  presented  that  are  worth  further  discussion. 

The  most  important  point  to  consider  in  aircraft  control  is  that  the  atom  functions  into  which  the  mission 
divides  are  essentially  decoupled.  Flight  control,  fire  control  and  navigation  (to  name  just  a  few  atom 
functions)  affect  different  actuators.  The  application  at  hand  is  characterized  by  the  fact  that  while 
different  atom  functions  might  be  triggered  by  common  sensor  inputs,  the  sink  sets  of  the  individual  atom 
functions  are  generally  distinct.  It  should  be  noted  here  that  the  pilot  is  still  the  overall  coordinating 
agency  —  albeit  at  a  much  higher  level  than  in  the  conventional  method. 

From  this  fact  follows  the  present  architecture  which  is  not  so  much  a  true  multiprocessor  architecture 
(Enslow,  P.H.,  March  1977)  as  it  is  a  collection  of  cooperatively  coupled  computer  systems  that  require 
controller  intervention  at  a  low  level  for  most  of  the  time. 


A  second  point  worth  considering  is  the  existence  of  two  distinct  bus  sets  -  the  data  bus  set  and  the  control 
bus  set.  The  control  bus  simplifies  the  executive  software  considerably  and  reduces  the  need  for  tight 
synchronization  in  the  system. 


Linked  with  the  idea  of  a  control  bus  permanently  captured  by  the  central  controller  are  the  bus  controllers 
and  the  architecture  of  each  PM.  The  PM  admits  of  considerable  internal  decentralization.  The  two  bus 
controllers  —  which  are  actually  dedicated  processors  with  their  own  buffers  —  handle  transactions  with  the 
outside  world.  The  Control  Bus  Controller  (CBC)  Is  the  "local  arm"  of  the  central  controller.  The  CBC  is 
activated  by  central  controller  command  and  is  thus  entirely  under  central  control;  but  it  has  sufficient 
intelligence  to  reduce  controller  loading.  (An  analogy  may  here  be  drawn  between  the  above  and  the  channel 
and  device  controllers  in  a  commercial  computer  system.  With  a  modestly  intelligent  device  controller,  the 
channel  controller  simply  needs  to  initiate  device  action  and  let  the  device  do  the  rest  until  a  device  end 
is  encountered.  Examples  of  device  controllers  are  disk  controllers,  card-reader  controllers,  etc.)  The 
control  hierarchy  in  DMACS  is  as  follows: 


64 


Central  Controller 

t 


Control  Bus  Controller 


4.  SYSTEM  PERFORMANCE 

4.1  Job  Starting  and  Job  Scheduling  Delays 

Job  scheduling  delay  Is  defined  as  the  duration  between  a  job  request  and  the  allocation  of  the  system  re¬ 
sources  for  the  execution  of  the  job.  Job  starting  delay  Is  the  time  delay  between  the  job  request  and  the 
actual  execution  of  the  job. 

Due  to  the  quasl-statlc  scheduling  policy  followed,  job  scheduling  and  job  starting  delay  are  relevant  only 
when  jobs  need  to  be  scheduled;  l.e.  at  moments  of  change  In  mission  profile  or  during  PM  replacements. 

Changes  In  mission  profile  are  brought  about  when  there  la  a  demand  for  a  new  set  of  atom  functions.  The 
varying  nature  of  mission  requirements  suggests  two  choices:  either  allot  a  separate  EM  triad  for  every  atom 
function  (whether  required  In  the  current  mission  profile  or  not)  or  allot  PM  triads  only  on  demand.  The 
second  approach  Is  the  more  practical  and  Is  adopted  here.  There  are  certain  flight  functions  such  as  flight 
control  and  navigation  that  must  remain  operational  throughout  the  mission  lifetime.  Others  such  as  those 
used  for  landing  and  takeoff  are  available  on  demand.  It  1b  the  scheduling  delay  for  these  jobs  that  we  are 
primarily  concerned  with:  job  scheduling  delays  do  not  ordinarily  affect  the  "perennial"  atom  functions  such 
as  those  cited  above. 

Job  starting  and  scheduling  delays  are  expressed  as  follows: 

Job  starting  delay  •  t^  +  tj  +  tj  +  t^ 

Job  scheduling  delay  “11+12 
where 

t-^  «  time  taken  by  the  controller  to  take  action  upon  the  request  for  that  particular  atom  function. 
t2  "  processor  allotment  time 
t3  -  software  transfer  time 
t^  •  processor  check  time 

Of  these  times  t^  and  tj  are  highly  variable;  t2  and  t^  are  not  so  Inconstant. 

Time  Invested  by  the  central  controller  In  reconfiguring  the  stem  function  Is  the  sum  of  t2  and  t^,  which 
is  relatively  small  and  constant.  The  rate  determining  step  in  job  starting  delay  Is  either  t^  or  t^  de¬ 
pending  on  central  controller  loading.  Except  under' the  most  difficult  conditions,  tj«  tj  which  Indicates 
tj  as  the  rate  determining  step,  tj  is  the  ratio  of  the  volume  of  software  transferred  to  bus  handwldth. 

As  estimation  of  the  values  of  these  variables  is  not  easy.  However,  an  order-of-magnltude  calculation  may 
be  attempted  as  follows. 

Processor  allotment  consists  of  two  stages:  1)  Rind  a  PM  that  Is  available 

2)  Update  the  CBC  Table  sid  the  COST. 

Step  1  Involves  (a)  accessing  of  a  record  from  the  COST,  (b)  checking  Its  suitability,  and  (c)  deciding 
whether  or  not  to  terminate  the  search. 

For  a  moderately  fast  system,  accessing  a  record  should  take  much  less  than  1  psec.  Checking  Its  suitability 
involves  computing  a  Boolean  function  that,  again,  should  take  somewhat  less  than  5  usee  (  we  assume  a  clock 
rate  of  10  MHz).  The  step  (c)  Is  essentially  an  appendage  of  (b). 

It  follows  therefore  that  the  total  time  (in  usee)  taken  in  locating  a  suitable  PM  is  less  than  six  times 
the  number  of  accesses  (typically  1).  It  is  usually  less  than  12  psec  even  under  poor  conditions. 

Once  a  PM  has  been  located,  updating  the  tables  entails  entering  some  four  variable  values  In  the  CCST  *nd 
the  atom  function  number  In  the  CBC  transactions  register.  This  should  Involves  less  than  10  clock  cycles 
per  entry  making  50  clock  periods  In  all,  or  about  5  psec. 


Each  applications  program  has  a  bootstrap  portion  that  loads  Into  the  CBC  table  the  variables  of  Interest. 
These  are  the  variables  the  bus  controllers  are  to  recognize  as  being  relevsnt  to  the  particular  atom 
function  In  hand.  This  does  not  usually  take  more  than  100  usee.  Hence  t^  £  100  usee,  and  t^  or  the  status 

check  time  Is  very  small  and  constant.  The  PM  In  question  runs  a  test  program  and  sends  the  results  to  the 
controller.  The  controller  has  only  to  match  the  answers  with  the  right  ones  (held  in  Its  private  memory) 
to  determine  processor  status. 

It  follows  then  that  the  total  central  controller  time  invested  per  PM  reconfiguration  approximates  100  usee. 
Hence  t^  is  now  estimable  by: 

t^  s  housekeeping  time  for  normal  activities  +  *  n 


where 

n  »  #  of  PM's  configured  after  request  was  received  from  the  concerned  PM. 

The  allotment  of  PM's  proceeds  on  a  priority  basis.  The  controller  scans  the  atom  functions  that  are  to  be 
represented  by  the  PM's  and  chooses  the  most  important  or  critical  atom  function  from  amongBt  these  for 
implementation.  This  procedure  ensures  swift  implementation  of  the  more  important  atom  functions. 

The  job  scheduling  and  starting  delay  for  reconfiguring  individual  PM's  ideally  have  identical  profiLea  — 
only  the  constants  involved  are  different. 

One  possible  outcome  of  the  PM  induction  procedure  is  that  under  extreme  circumstances  it  may  so  happen  that 
the  least  important  atom  function  will  never  get  assigned.  This  could  be  forestalled  by  automatically  up¬ 
dating  priority  as  a  monotonically  increasing  function  of  waiting  time.  We  choose,  however,  not  to  do  so 
since  the  more  critical  functions  must  never  be  impaired  for  more  than  the  minimum  possible  duration.  In  any 
case,  rs  we  shall  see,  this  problem  is  more  academic  than  real. 

Figure  4  depicts  the  job  scheduling  delay  as  a  function  of  the  precedence  in  t'.ij  Job  request  queue.  The 
precedence  in  the  job  request  queue  is  easy  to  determine.  We  have  two  distinct  conditions  under  which 
allocation  is  carried  out.  The  mission  profile  may  change  or  processors  could  suffer  random  knockouts. 

The  former  case  involves  an  entirely  new  set  of  atom  functions  simultaneously  being  required.  The  preced¬ 
ence  in  the  waiting  queue  is  then  simply  the  relative  importance  of  the  function  in  relation  to  the  others 
in  the  new  set. 

A  more  complicated  case  (theoretically  speaking)  arises  in  random  knockouts.  As  was  mentioned  earlier,  it 
is  entirely  possible  that  the  random  knockout  of  processors  should  occur  at  so :h  a  pace  and  in  such  a  se¬ 
quence  as  to  effectively  kill  a  lowly  atom  function.  This  can,  however,  occur  only  when  more  than  one 
failure  occurs  more  or  less  simultaneously.  This  is  highly  improbable.  The  probability  of  failure  of  a  PM 
-4 

is  around  10  per  hour.  Reconfiguration  takes  less  than  100  usee  for  the  controller  to  achieve.  Prob¬ 
ability  of  a  processor  failing  in  that  time  is  approximately  10  11 .  For  any  atom  function  to  be  kept  waiting 
for  central  controller  attention  for  time  T,  the  number  of  more  critical  PM's  that  must  fail  Is  T/100  since 
100  usee  is  approximately  the  time  required  by  the  central  controller  to  reconstitute  the  injured  triad. 

4.2  Bus  Access  Delay 

The  system  consists  of  a  set  of  processors  communicating  by  means  of  two  sets  of  buses  —  a  data  bus  and  a 
control  bus;  both  triply  redundant  for  adequate  fault  tolerance.  The  control  bus  is  permanently  captured 
by  the  central  cluster  controller  and  employed  in  such  activities  aa  test  initiation,  bus  grant  and  rebroad¬ 
cast  command  as  well  as  the  DISCONNECT  coramant  issued  by  the  central  controller  to  a  failed  processor.  The 
data  bus  is  allocated  to  whichever  processor  needs  it  on  a  First  Come  First  Served  basis.  The  average  access 
delay  and  maximum  access  delay  as  a  function  of  the  bus  demand  profile  are  studied. 

The  actual  procedure  for  determining  delay  is  very  simple.  Bus  grant  requests  are  put  into  a  time  indexed 
array  (a  list)  in  the  order  in  which  they  arrive.  The  central  controller  steps  through  the  items  in  the 
list  granting  access  to  the  oldest  item  still  outstanding.  The  time  at  which  this  bus  grant  is  achieved  is 
noted  and  the  delay  is  computed  by  subtracting  the  request  arrival  time  from  the  bus  grant  time.  Using  these 
figures,  it  is  possible  to  arrive  at  values  for  the  maximum  wait  time  for  bus  grant  and  for  the  average  wait 
time.  Both  parameters  are  of  interest  in  evaluating  the  system;  they  have  an  important  role  to  play  in  the 
validation  process. 

The  specific  case  we  have  described  here  is  for  20  central  cluster  requests  per  "request  cycle".  The  figure 
of  20  may  appear  somewhat  arbitrary,  but  in  fact  represents  a  system  of  typical  complexity.  Again,  we're 
looking  more  for  the  shape  of  the  response  profile  than  for  actual  numerical  values. 

The  input  arrival  rate  profiles  studied  are  as  in  Figure  5.  They  therefore  range  from  the  uniform  (1  request 
per  interval)  to  the  very  bursty  (20  requests  in  the  first  frame;  0  elsewhere).  The  average  and  maximum 
delay  values  that  results  are  graphed  in  Figure  5. 

4.3  System  Reliability* 

Reliability  is  a  measure  of  the  probability  of  failure.  In  a  system  as  complicated  as  ours,  there  are  clearly 
many  classes  of  failure.  These  are  listed  below. 

Type  1  failure:  A  type  1  failure  is  said  to  have  occurred  when  the  capacity  of  the  system  to  compute  a 
particular  atom  function  has  been  permanently  removed.  (By  'permanently'  we  mean  of  course  till  the  system 
is  manually  serviced).  Since  there  are  many  atom  functions,  more  than  one  type  1  failure  can  have  occured 
in  the  system  at  any  one.  time. 

This  portion  is  drawn  from’  (Shin,  K.G. ,  and  Krishna,  C.M.,  December  1980). 


6-6 


Type  2  fallura:  A  type  2  failure  1*  said  to  have  occuriad  whan  tha  capacity  of  the  system  to  coapute  a 
particular  atoa  function  hae  bean  temporarily  raaoved.  Ha  subdivide  this  claaa  Into  two  subclasses. 

A  type  2a  failure  occurs  when  tha  impairment  of  system  function  has  occurred  for  tha  tins  needed 
to  switch  from  active  to  backup  units.  This  time  la  relatively  aaall. 

JX8S-ik:  A  type  2b  failure  occurs  whan  tha  Impairment  lasts  for  as  long  at  It  takas  to  raallocate  functions 
among  the  central  cluster  processors. 

Generally,  a  type  2b  fallura  takes  auch  longer  to  recover  froa  than  does  a  type  2a  failure. 

Probabilities  of  fallura  can  be  deduced  as  follows: 

Let 

m^  -  number  of  output  actuator  triads  foralng  the  sink  set  of  atom  function  1. 

-  corresponding  number  for  sensor  triads  in  source  set  of  atom  function  1, 

paena  *■  probability  of  a  sensor  failure. 
p#ct  ■  probability  of  an  actuator  failure. 
p^us  -  probability  of  a  bus  failure. 

Pproc  “  Pr°b*mity  of  processor  failure. 

It  bears  pointing  out  at  this  stage  that  the  above  probabilities  are  very  small;  we  note  that  a  typical 
-4  -7 

range  Is  10  to  10  per  hour.  With  these  values  In  Bind; 

Probability  of  a  type  1  failure 

pl  '  I  tnlp3sen  +  "lP3act  +  (“l^ppjr, 

Probability  of  a  type  2a  failure, 

P2a  "  1  [“ip«ct  +  (WVoc1  +  Pproc 

Probability  of  a  type  2b  failure, 

P2b  “  ^°Pproc 


oc]  +2pbua 


+2P, 


bus 


where  a  -  number  of  atom  functions  in  tha  alssion.  To  obtain  a  feeling  for  the  actual  figures  Involved,  we 
employ  the  following  probability  estimates: 

Pproc  “  10’4/hr-  Pact  "  10'6/hr-  P.ene  "  1<r6/ht-  Pb„.  "  ^~5/hr. 

A-  — *  simp:  that  all  atom  functions  are  Identical  with  respect  to  hardware  requirements  and 

n^  ■  2  for  all  1,  m^  »  1  for  all  1,  a  »  15. 

In  such  a  case, 

—  “11  ~  “3  —  —6 

p^  «  0.5  x  10  per  hour,  p^  «  0.5  x  10  per  hour,  Pj^  »  10  per  hour. 

Note  here  that  a  .  failure  Is  the  only  true  failure  In  the  system  sense;  type  2a  and  2b  failures  result 

In  system  reconfl^  .  it ion  with  some  less  of  throughput,  but  no  system  Impairment  of  aore  than  a  temporary 
nature . 

5.  COMPARISON  WITH  FTMP 

We  turn  now  to  comparing  two  similar  architectures:  the  DMACS  and  the  C.  S.  Draper  Laboratory's  Fault-Tolerant 
Multiprocessor  (FTMP).  It  Is  not  our  intention  in  this  section  to  seek  to  make  definitive  judgments  upon 
the  relative  worth  of  the  systems  —  only  to  describe  the  Implications  of  a  set  of  design  options  taken  In 
each  case. 

The  FTMP  is.  In  hardware  terms,  superficially  similar  to  our  architecture.  For  instance,  it  is  a  bus-oriented 
machine,  with  triple  redundancy  being  used  throughout  for  the  detection  and  correction  of  errors. 

The  Draper  Laboratory  has  essentially  chosen  a  different  set  of  options  from  ourselves.  A  study  of  the 
differences  together  with  a  brief  overview  of  the  implications  is  worthwhile  since  it  brings  out  In  sharp 
relief  the  tradeoff  options  available  to  the  systems  architect. 


6-7 


The  major  points  of  difference  are: 

e  The.  processors  In  a  triad  operate  In  tight  synchronism  In  the 
FTMP  while  operation  Is  completely  asynchronous  in  the  DMACS 
architecture. 

e  FTMP  Is  essentially  the  central  portion  of  an  aircraft  computer 
control  facility:  data  acquisition  and  delivery  are  not  considered 
in  much  detail.  The  DMACS  architecture  explicitly  Incorporates 
sensors,  actuators  and  associated  processing  equipment  Into  the 
system. 

e  Job  scheduling  In  FTMP  Is  completely  dynamic;  the  system  controller 
Is,  for  all  practical  purposes,  a  job  scheduler.  The  DMACS  system 
Involves  quasl-Btatlc  job  scheduling. 

e  The  bus  structures  are  different.  FTMP  has  a  "Mass  Memory  Bus", 
an  "Internal  1/0  Bus"  and  an  "External  I/O  Bus"  while  the  DMACS 
makes  do  with  just  two  sets  of  buses:  a  data  bus  set  and  a  control 
bus  set. 

e  The  basic  processing  module  is  far  simpler  In  FTMP  than  In  the 
DMACS. 

Me  provide  below  a  more  detailed  exposition  of  the  consequences  of  the  differences  noted  above.  In  the 
FTMP,  all  elements  of  the  multiprocessor  operate  using  a  common  time  reference.  Four  mutually  phase-locked 
clock  generator  modules  operating  together  provide  a  fault-tolerant  time  reference  (Lala,  J.H.,  &  Smith,  C.J., 
1979).  The  effect  of  this  tight  synchronism  is  to  make  data  transfer  between  processors  and  memory  and  pro¬ 
cessors  and  processors  simpler  and  therefore  faster.  A  common  clock  obviates  the  need  for  a  pseudo-handshake 
;<s  is  used  in  the  DMACi:  architecture.  However,  for  this  benefit  in  lowered  intercomaunicntlon  complexity,  we 
have  to  pay  in  terms  of  reduced  reliability.  The  disabling  of  the  clock  will  be  disastrous  to  the  system; 
and  while  the  existence  of  four  clock  modules  phase-locked  to  each  other  assures  considerable  fault-tolerance, 
the  synchronism  nonetheless  introduces  an  additional  potential  source  of  catastrophic  failure.  An  additional 
consequence  of  this  is  seen  in  the  consideration  of  the  third  point  in  the  above  to  which  we  shall  come. 

FTMP  has  an  External  I/O  bus  and  an  External  I/O  port  that  handle  data  input  and  output.  No  restriction  is 
therefore  placed  on  the  hardware  acquiring  and  using  data:  the  configuration  of  the  sensors  and  actuators 
is  undefined.  This  makes  for  easier  adaptability  to  existing  systems.  The  FTMP  can  therefore  be  "added  on", 
so  to  speak. 

On  the  other  hand,  the  DMAC  architecture  imposes  a  certain  structure  upon  the  actuators  and  sensorB.  The 
reason  is  that  we  felt  that  characterizations  of  the  system  would  be  invalid  if  they  did  not  Include  the 
communication  with  the  environment  as  part  of  the  systems  -  and  this,  is  after  all,  the  very  reason  for 
the  existence  of  the  computer  system  in  the  first  place. 

Job  scheduling  in  the  FTMP  is  entirely  dynamic.  This  is  justified  by  the  Draper  Laboratory  after  considera¬ 
tion  of  the  alternative  which  is  the  synchronous  job  scheduler.  In  synchronous  job  scheduling,  periodic  jobs 
are  completely  prescheduled  with  each  Job  occupying  a  certain  predefined  time  slot  in  the  schedule.  The  main 
advantage  is  low  central  control  overhead.  It  is  claimed  by  the  Draper  Laboratory  that  the  major  disadvantage 
of  this  kind  of  algorithm  is  the  lack  of  flexibility  and  the  complexity  of  preassignlng  Jobs  to  processors  in 
a  three-unit  multiprocessor.  Again,  failure  of  one  of  the  processors  in  a  triad  or  changes  in  job  parameters 
such  as  iteration  rates,  may  require  a  totally  new  schedule.  The  synchronous  scheduler  is  therefore  not 
adopted  in  FTMP  (Lala,  J.H.,  and  Smith,  C.  J.,  1979).  Instead,  an  entire  scheduling  is  carried  out  whenever 
an  atom  function  has  to  be  executed. 

The  problems  pointed  out  in  the  remarks  above  are  very  real;  bit  we  believe  they  follow  partly  from  the 
tight  synchronism  the  FTMP  is  forced  to  operate  in.  In  an  asynchronous  and  highly  decentralized  system  — 
such  as  ours  —  all  the  advantages  of  on-time  Job  execution  with  practically  no  delays  and  a  high  load  factor 
are  available  (as  we  have  seen  in  the  performance  characteristics)  without  the  disadvantages  mentioned  above. 

Again,  when  the  mission  profile  changes,  requiring  a  large-scale  reallocation,  the  applications  software  for 
the  new  atom  functions  thereby  Introduced  are  loaded  (in  the  DMACS)  using  DMA  end  a  conceptually  simple  pro¬ 
cedure.  Reconfiguration  time  in  such  caseB  is  very  low. 

Our  bus  system  is  conceptually  simple.  All  data  (whatever  its  origin)  is  treated  in  the  same  way  and  broad¬ 
cast  on  the  data  bus  according  to  the  same  format.  This  simplifies  malfunction  detection  and  handling  and 
makes  the  systems  software  less  ccmplex.  A  control  bus  is  used  in  addition  to  the  data  bus  to  simplify 
central  controller  intervention  in  the  system.  The  bus  structure  of  FTMP  is  nuch  more  complex.  While  it 
does  not  necessarily  follow  that  a  reduced  reliability  is  the  result  of  such  Increased  complexity,  it  is, 
in  our  opinion,  to  be  avoided  wherever  possible. 

All  differences  in  structure  and  performance  between  the  FTMP  and  DMACS  can  be  held  to  issue  from  one  basic 
difference  in  design  philosophy:  FTMP  IS  LESS  DECENTRALI7ED  THAN  OUR  SYSTEM.  The- central  controller  has  a 
major  role  to  play  in  finding  a  processor  every  time  an  atom  function  is  to  be  executed;  no  matter  whether 
the  atom  function  is  periodically  required  or  not.  The  central  controller  —  which  as  has  been  pointed 
out  is  basically  a  Job  scheduler  —  is  thus  involved  in  scheduling  even  continuously  periodic  functions. 

The  result  is  a  continuous  high  loading  upon  the  controller  and  a  relatively  high  overhead  in  the  form 
of  applications  software  transfer.  This  may  result  in  needlessly  slowing  down  the  system. 

The  DMACS  architecture,  on  the  other  hand,  follows  a  consciously  laid  down  policy  of  maximum  decentralization. 
The  central  controller  is  Involved  in  regular  activltiec  mainly  in  arbitrating  access  to  the  data  bus. 

Regular  housekeeping  chores  are  therefore  not  time-consuming.  This  has  the  merit  that  when  nn  abnormal 


6-e 


event  tek.ee  piece  the  coi  roller  rceponee  Is  teeter  then  It  would  otherwise  be.  The  controller  delegates 
many  of  the  routine  chores  to  the  control  bus  controllers  In  the  vsrlous  processing  Modules. 

It  Is  worth  pointing  out  that  the  FTMP  system  Is  far  older  than  our  own.  Consequently,  It  has  undergone 
aore  detailed  analysis  than  the  DKACS.  For  one  thing,  a  prototypical  version  of  FTMP  has  been  constructed 
while  our  systen  is  as  yet  in  the  design  stage.  All  our  remarks  should  therefore  be  read  in  thle  context. 

6.  CONCLUSIONS  AND  DISCUSSION 

This  paper  has  sought  to  describe  a  distributed  processor  structure  for  the  effective  control  of  military 
aircraft.  The  goal  has  been  to  configure,  out  of  components  of  military-grade  reliability  end  easy  avail¬ 
ability,  a  computer  system  that  is  at  the  smne  time  easy  to  expand,  service,  program  and  that  is  reliable 
end  flexible  enough  to  accept  a  considerable  number  of  alterations  without  requiring  a  major  revision  of 
the  basic  structure. 


This  project  was  motivated  by  a  desire  to  employ  the  concept  of  Integrated  Control  In  fighter  aircraft. 

Ad-hoc  addition  of  components  to  aircraft  results  In  wasteful  redundancy  that  does  not  necessarily  translate 
into  Increased  veal  redundancy  from  the  performance  point  of  view.  Again,  there  Is  the  very  real  possibility 
of  one  element  in  the  system  affecting  the  performance  of  another;  thus  degrading  the  overall  system  per¬ 
formance.  This  is  clearly  an  unsatisfactory  state  of  affairs  but  one  that  occurs  frequently  In  extremely 
complex  systems.  The  conception  and  design  of  the  system  as  a  whole  generate  certain  problems.  However, 
the  holistic  design  of  complex  systems  provides  one  with  an  opportunity  to  carry  out  optimisation  with 
respect  to  the  whole  system  and  not  just  with  respect  to  one  isolated  component  portion  of  it.  By  pooling 
all  resources  into  one  large  system  It  is  possible  always  to  provide  Increased  reliability  to  the  more 
critical  functions.  Fighter  aircraft  today  are  designed  to  fly  to  the  edge  of  Instability  and  designers 
push  the  inherent  properties  of  the  basic  mechanical  structure  of  the  aircraft  to  the  maximum  possible 
extent.  In  such  a  dynamic  —  and  nut  always  friendly  —  environment.  It  is  essential  that  the  reliability 
of  the  basic  critical  flight  functions  be  extremely  high.  The  high  reliability  required  of  any  system  used 
aboard  a  manned  aircraft  has  to  be  achieved  by  using  components  thst  by  themselves  are  far  less  reliable  then 

.4 

that.  Commonly  used  figures  for  the  reliability  of  processors  peg  the  reliability  at  around  10  failures 
per  hour.  Mechanical  devices  such  as  actuators  do  not  show  a  very  great  improvement  upon  this  figure.  It 
is  therefore  contingent  upon  the  structure  or  the  configuration  of  the  compute-  system  to  create,  out  r? 
components  that  are  by  themselves  not  very  reliable,  a  super-reliable  system. 

The  requirement  of  high  throughput  is  no  less  important  than  that  of  reliability.  The  fighter  aircraft 
operates  in  a  highly  dynamic  environment  and  much  of  the  data  from  the  sensors  has  to  be  processed  in  real¬ 
time.  The  environment  is  characterised  by  rapid  surges  in  demand  upon  the  services  of  the  computer  system. 
The  system  cust  therefore  be  robust  enough  to  absorb  such  surges  without  a  lowering  of  reliability. 

A  third  requirement  is  ease  of  programing  and  system  flexibility.  A  system  that  Is  not  easy  to  program  is 
potentially  very  expensive  to  operate  and  Is  prune  to  errors.  System  flexibility  and  modularity  are  required 
for  ease  of  servicing  and  maintenance. 

The  present  system  is  based  upon  the  three  basic  requirements  lilted  above.  Reliability  Is  provided  through 
the  use  of  triple-modular  redundancy  with  voting  and  a  conceptually  simple  structure. 

A  high  throughput  (or  low  bus-acceca  delay  whichis  equivalent  to  high  throughput  in  our  case)  is  achieved 
by  meano  of  using  two  sets  of  buses  instead  of  just  one.  The  control  bus  triad  serves  essentially  two  pur¬ 
poses:  first,  it  lowers  the  demand  upon  the  data  bus  and  second,  it  provides  the  central  controller  with  a 
simple  means  of  propagating  control  signals.  Controller  intervention  into  system  activity  Is  not  delayed 
by  ongoing  transmissions  upon  the  data  bus. 

The  modularity  that  is  built  into  the  system  provides  ease  of  programming  together  with  expandability  and 
Improved  serviceability. 

It  is  clear,  therefore,  that  the  configuration  arrived  at  is  a  direct  result  of  the  requirements  of  reli¬ 
ability,  ability,  high  throughput  and  flexibility. 

There  are,  however,  many  interesting  problems  not  yet  studied  in  any  great  depth.  The  simulation  of  the 
present  structure  has  been  partial  and  with  respect  only  to  specific  characteristics  such  as  job  scheduling 
delays,  reliability  and  bus  access  delay.  A  more  complete  simulation  package  for  the  system  is  planned. 

REFERENCES 

[1]  Enalow,  P.  H. ,  March  1977,  "Multiprocessor  Organisation  -  A  Survey",  Computing  Surveys,  Vol.  9,  No.  1, 
pp.  103-129. 


[21  Hopkins,  A.  L.  Et  al. ,  October  1978,  "FTMP  -  A  Highly  Reliable  Fault-Tolerant  Computer  for  Aircraft 
Control",  Proceedings  of  the  IEEE.  Vol.  66,  No.  10,  pp.  1221-1239. 

[3]  I.ala,  J.  H.  and  Smith,  C.  J.,  lc'79,  "Performance  and  Economy  of  a  Fault-ToJ  vrant  Multiprocessor", 
New  Tork,  Proceedings  of  National  Computer  Conference,  pp.  481-492. 


[4]  Robinson,  A.C.,  Hitt,  E.F.,  December  1978,  "Integrated  Control  -  A  Unified  Approach  to  Management  of  an 
Aircraft",  Task  Final  Report,  AFFDL/AC,  Contract  No.  F33615-76-C3145. 


[5]  Shin,  K.  G.,  December  1979,  "System  Architectures  for  Implementing  Integrated  Control  Approach  to 
Management  of  a  Military  Aircraft",  Final  Report,  AFFDL/AGC,  Contract  No.  F  33615-76-C314' ,  Request 
No.  49. 


[6] 


Shin,  K.G.,  and  Krishna,  C.M. ,  December  1980,  "A  Distributed  Microprocessor  System  for  Controlling 
and  Managing  Military  Aircraft",  Miami,  Florida,  Proceedings  of  Distributed  Data  Acquisition,  Caaputlm 


6-9 


and  Control  Sympoelua. 

[7]  Wensley,  J.H. ,  et  »1.,  October  1978,  "SIFT,  The  Design  and  Analysis  of  Fault-Tolerant  Computer 
for  Aircraft  Control",  Proc..  IEBE,  Vol,  66  No.  10,  pp.  1240-1255. 

[83  White,  J.  A.,  et  al.,  October  1979,  "Multi-microproceaaor  Flight  Control  System  Architectural  Concepte", 
Loe  Angelee,  CA.,  Proceedings  of  AIAA  Coaputera  in  Aeroepaca  Conference,  pp.  87-92. 


Figure  1.  Overview  of  System  Architecture 


Precedence  in 
j  ob  queue 


T  «  housekeeping  time 


Figure  4.  Job  Scheduling  Delay 


*\eq/ frame 


7-1 


THE  DEVELOPMENT  OF  ASYNCHRONOUS  MULTIPROCESSOR  CONCEPTS  FOR 
FLIGHT  CONTROL  SYSTEM  APPLICATIONS 


S.  M.  Wright  and  J.  G.  Brown 
British  Aerospace 
Brough 

North  Humberside 
United  Kingdom 


SUMMARY 

In  early  1979  a  limited  researoh  investigation  was  initiated  to  exemine  the  possible  impact  that  recent 
advances  in  large  scale  integrated  circuit  technology  might  have  if  applied  to  flight  control  systems. 

The  initial  studies  concentrated  on  alternative  digital  processor  architectures. 

One  promising  research  avenue  was  identified  as  being  the  uBe  of  multiple  microprocessors,  each  functionally 
dedicated  and  running  asynchronously  with  a  short  program  cycle  time.  This  approach  promises  benefits  in 
a  number  of  areas t 

1.  ease  of  generating/ proving  high  integrity  software 

2.  reduoed  propagation  delays 

3.  reduced  hardware/sof tware  synchronisation  overheads 

It.  retention  of  classical  feedback  control  design  techniques 

5.  extendable  processing  power 

Part  of  the  ongoing  Active  Control  Technology  activity  at  BAe  Brough  is  an  involvement  in  a  flight  dynamics 
research  programme  using  a  RAE  Hunter  aircraft  converted  to  fly-by-wire.  This  programme  has  identified  a 
need  for  a  flexible  digital  flight  control  processor  for  such  research  and  has  provided  a  focus  and  stimulus 
for  multiprocessor  studies  As  a  result  BAe  Brough  are  now  in  the  process  of  developing  a  multiplex  digital 
full  authority  flight  control  computer  for  thie  specific  application,  with  a  view  to  installation  in  the 
aircraft. 

Because  of  the  short  timedcale  of  this  particular  application  certain  questions  which  would  relate  to  a 
production  system  have  been  circumvented  rather  than  resolved.  However,  this  does  not  affect  the  concept 
which  is  considered  to  be  of  considerable  interest  and  relevanoe  to  future  systems. 

1.  INTRODUGTIOU 


As  aircraft  designers  have  striven  for  more  and  more  aerodynamic  performance,  the  aircraft's  natural 
stability  and  control  characteristics  have  deteriorated.  This  has  required  the  application  of  increasingly 
complex  feedback  control  systems  in  order  to  artificially  restore  good  handling  characteristics.  This 
process  has  reached  the  stage  where  some  current  and  most  projected  esmbat  aircraft  are  totally  reliant  on 
the  correct  and  continuous  operation  of  these  active  control  sysLems.  Typically,  present  generation 
aircraft  use  analogue  computation  which  is  multiple  redundant  in  order  to  achieve  the  integrity  targets 
necessary  for  a  flight  critical  function  e.g.  Tornado  or  F16. 

Analogue  computation,  however,  restricts  the  type  and  complexity  of  control  law  which  can  be  applied.  It 
can  also  cause  production  and/or  maintenance  problems  in  achieving  the  required  level  of  matching  of 
characteristics  between  one  computer  and  the  next,  and  it  can  be  difficult  to  modify  the  control  character¬ 
istics. 

These  problems  have  led  designers  towards  the  application  of  digital  processors  to  computation  of  the 
control  action  since  they  promise  to  substantially  reduce  all  the  above  problems.  Digital  computers  offer 
the  additional  advantages  of  being  able  to  incorporate  extensive  built-in  and  pre-flight  tests,  together 
with  a  reduced  size  and  weight.  The  overall  architecture  of  the  system  has  remained  as  a  number  of 
identical  lanes  each  designed  around  a  central  digital  processor.  In  practice  it  is  becoming  increasingly 
apparent  that  the  software  costs  associated  with  such  systems  are  very  high.  Theee  costs  are  principally 
associated  with  the  flight  critical  nature  of  the  computation  since  if  there  were  any  faults  in  the  software, 
then  it  would  occur  simultaneously  in  all  laneB  of  the  system  causing  possible  loss  of  the  aircraft.  In 
order  to  minimise  the  possibility  of  such  a  failure,  great  reliance  is  placed  on  independent  and 
comprehensive  cr038  checking  of  the  operation  of  the  control  program,  and  or.  the  management  system 
established  to  ensure  compliance  with  these  safeguards.  Further  costs  are  introduced  by  the  need  to 
produce  large  amounts  of  code  which  must  be  optimised  so  that  any  computation  time  delays  are  minimised. 

This  involves  programming  at  assembler  level,  and  this  in  turn  requires  the  establishment  of  a  dedicated 
team  of  experienced  programmers.  This  results  not  only  in  high  costs  but  also  in  long  timescales  from 
control  law  specification  to  the  production  of  verified  software.  This  may  be  acceptable  for  a  production 
aircraft  but  certainly  not  for  a  research  aircraft,  and  probably  nci  for  the  development  phase  of  a  new 
aircraft  since  here  the  ability  to  rapidly  modify  the  control  characteristics  is  essential. 

Even  after  the  most  comprehensive  software  testing  there  will  stili  be  some  concern  that  there  could  be 
some  latent  fault  present  in  the  program  which  would  only  manifest  itself  under  a  particular  combination 
of  circumstances,  resulting  in  a  catastrophic  failure.  This  is  due  to  the  very  large  number  of  states 
that  a  digital  processor  can  enter,  corresponding  to  different  data  values  and  paths  through  the  program. 
Because  of  these  problems  it  was  considered  worthwhile  re-examining  the  basic  concept  of  the  central 
digital  processor  to  see  if  a  different  hardware  approach  could  ease  the  task  of  software  generation, 
particularly  with  reference  to  research  and  development  aircraft  but  potentially  for  general  application. 

The  aim  of  this  investigation  was  to  reduce  the  magnitude  of  the  software  ta3k  to  a  level  where  it  could 
be  accommodated  by  an  on-site  team,  to  suggest  ways  of  generating  visible  testable  software  less  prone  to 
context  dependent  failures  and  to  provide  a  system  with  the  type  of  excess  computational  power  that  would 
allow  the  convenient  investigation  of  advanced,  control  concepts.  This  approach  was  considered  viable 
firstly  because  the  amount  of  programming  normally  associated  with  control  functions  could  be  less  than  ?%% 


7-2 


the  rest  being  accounted  for  by  various  housekeeping  functions  such  as  consolidation,  built-ir.  and 
pre-flight  test,  failure  management  etc.  (Ref.  Corney  1979)*  Secondly,  because  of  the  considerable 
advances  being  male  in  the  field  of  large  scale  integrated  circuit  technology,  particularly  in  the  area 
of  microprocessors. 

2.  PROPOSED  PROCESSOR  ARCHITECTURE 


2.1.  Relevance  of  a  multiprocessor  approach 

The  first  step  is  to  attempt  to  partition  the  software  into  small  modules  which  can  only  interact  in  a 
limited  number  of  well  defined  ways  which,  if  possible,  Bhould  ease  the  specification,  modification  and 
testing  of  the  program.  This  type  of  partitioning  has  been  implemented  on  ground  based  systems  using 
specialised  operating  systems  e.g.  MASCOT  (ref.  Jackson  A.  and  SimpBon  H.  R.  197U)  and  .  ;  the  considerable 

advantage  that  the  interi'ace  between  modules  is  so  well  defined  that  an  individual  moduli  can  be  removed, 
modified  and  replaced  vritnout  requiring  re-validation  of  all  the  other  software  modules  comprising  the 
system.  However,  this  technique  implies  the  use  of  complex  executive  software  which  in  itself  would  be 
difficult  to  develop  and  validate  to  the  required  level  of  confidence,  and  would  be  an  extra  overhead  on 
the  control  processor's  time.  Hence  both  the  software  and  hardware  need  to  be  partitioned,  i.e.  each 
software  module  can  be  allocated  its  own  dedicated  processor  such  that  all  the  tasks  run  in  parallel. 

Since  the  constituent  processors  in  such  a  system  are  running  in  parallel  then  the  computation  time  delay 
can  be  significantly  reduced,  hopefully  to  a  value  where  it  becomes  insignificant.  ThiB  should  relax  the 
otherwise  stringent  requirement  of  producing  highly  optimised  code  to  minimise  execution  time.  It  could 
also  simplify  the  control  law  design  task  since,  if  time  delays  ore  insignificant,  then  the  analysis  and 
design  of  the  control  system  can  be  accomplished  using  classical  linear  control  theory  with  no  need  to 
resort  to  Z  transform  techniques. 

An  additional  advantage  of  a  multiprocessor  configuration  is  that  it  provides  extendable  processing  power, 
thus  allowing  flexible  incremental  enhancement  of  the  control  capabilities  of  a  system,  either  in  response 
to  new  applications,  or  to  «  gradual  development  of  the  original  application. 

2.2.  Choice  of  a  communication  strategy 

The  traditional  difficulty  with  a  multiprocessor  eyBtem  lies  in  the  choice  of  a  communication  strategy. 

If  a  bus  structure  is  chosen  then  the  throughput  of  the  bus  can  constrain  system  expansion  or  introduce 
variable  time  delays  dependent  on  bus  loading.  If  a  multiport  memory  technique  is  used  then  this  limits 
the  number  of  processors  which  can  be  attached.  More  advanced  concepts  such  as  packet  switching  networks, 
could  comprise  a  research  programme  in  their  own  right.  With  any  of  these  systems  it  is  difficult  to 
constrain  access  between  processors  such  that  the  effect  of  a  software  fault  in  one  processor  cannot  cause 
unpredictable  software  faults  in  other  modules. 

A  network  communication  strategy  does»not  inhibit  ByBtem  expansion  since  the  number  of  links  can  be 
increased  indefinitely  to  aecommodateYthe  extra  data  traffic  caused  by  additional  processors.  If  the 
links  are  constrained  arbitrarily  to  cairy  data  but  not  control  information  then  the  effects  of  failure  in 
any  one  module  become  predictable,  hencd  allowing  the  possible  containment  of  failures  which  occur  in  non 
flight  critical  sections  of  code.  This\should  either  improve  reliability  in  operation,  or  reduce  the 
burden  of  testing.  This  inherently  rigorous  control  over  the  interface  between  software  modules  also 
reduces  the  potential  for  adverse  interaction  between  the  sections  of  code  produced  by  different  members 
of  a  programming  team  either  during  the  initial  program  development  phase,  or  after  modification  of  an 
individual  module,  Bince  the  structured  programming  concept  of  having  locally  defined  variables  only 
available  locally  is  implicit  in  this  syslem. 

Typically  the  data  being  transmitted  between  processing  modules  ir,  a  real  time  control  system  consists  of 
a  number  of  variables  each  representing  a  tantinuous  function  of  time.  If  each  variable  is  allocated  a 
'unique  communication  channel  then  the  operettion  of  the  system  can  be  conveniently  monitored,  hence  easing 
testing  and  acceptance  procedures.  Since  eEch  variable  represents  a  continuous  function  of  time  and  cannot 
be  overwritten  except  by  a  more  recent  value,  then  it  becomes  possible  to  dispense  with  the  need  for 
handshake  routines,  interrupt  handlers,  or  other  software  protocols,  again  easing  the  programming  task. 

Thus  we  have  the  concept  of  a r  asynchronous  multiprocessor  flight  control  computer.  If  this  computer  can 
be  made  from  a  number  of  identical  hardware  blocks  then  there  is  also  potential  for  reduced  hardware  costs 
and  increased  flexibility  in  configuring  a  control  computer  to  meet  different  applications,  requiring 
perhaps  different  levels  of  redundancy,  or  processor  power.  This  is  in  addition  to  toe  system  design, 
programming,  and  integrity  benefits  already  suggested. 

3 .  EXPERIMENTAL  MUi  . I PROCESSOR  SYSTEM 

3.1.  Choice  of  Processor 

Having  decided  to  investigate  the  implications  of  ,his  type  of  asynchronous  multiprocessor  concept,  the 
first  task  was  to  choose  a  commercially  available  microprocessor  wnic'n  could  demonstrate  the  major 
features  of  the  idea  without  involving  an  extensive  hardware  development  programme.  The  relatively  large 
numter  of  processor  modules  anticipated  for  a  practical  system  focused  our  attention  on  single  chip 
microprocessors  in  order  to  keep  the  volume  of  the  final  system  within  practical  bounds.  One  simple 
method  of  achieving  asynchronous  communication  between  processors  is  to  use  analogue  interconnections; 
this,  together  with  a  requirement  for  high  processor  speed  and  UV  Erasable  PROM  programming,  narrowed  the 
available  field  down  to  one  device,  the  Intel  2920  Analogue  Signal  Processor  (See  figure  1  for  a 
functional  block  diagram  and  functional  specification  of  this  processor.) 

The  choice  of  a  device  with  analogue  interfaces  also  allows  convenient  integration  with  existing  analogue 
night  control  systems  with  which  we  are  involved  and  offers  the  possibility  of  enhancing  these  systems 
and  developing  control  computing  .ideas  in  parallel. 


7-3 

There*  are,  however,  some  character-! sties  of  this  device  which,  while  they  would  not  affeet.  any  concept 
proving  exercises,  might  prejudice  application  to  a  realistic  control  task.  Vi 2 j 

1.  I/O  resolution  of  only  9  hits  (internal  resolution  of  25  hits) 

2.  Limited  instruction  set  (no  branch  instruction) 

3.  Limited  program  space 

it.  New  device  of  uncertain  detailed  characteristics 

It  was  decided  to  proceed  with  a  concept  proving  exercise  since  it  was  anticipated  that,  if  successful, 
then  the  above  limitations  could  be  overcome  if  necessary  on  a  fully  engineered  system,  probably  using 
a  different  processor.  Alternatively  this  initial  study  might  show  that  the  short  term  expedient 
application  of  this  device  tc  current  flight  control  tasks  wee  practical. 

The  resolution  of  the  analogue  input  and  output  is  at  least  two  bits  less  than  required  but,  since  the 
relevant  interfaces  are  an  integral  part  of  the  device,  it  was  impossible  to  alter  the  hardware  character¬ 
istics.  A  software  technique  was  therefore  developed  which  (at  the  expense  of  a  small  external  hardware 
modification)  enabled  bandwidth  to  be  traded  for  improved  resolution.  This  technique  hae  been  shown  to 
be  capable  of  achieving  a  three  bit  enhancement  with  a  reduction  in  interface  bandwidth  from  8  kHz  to  1  kHz. 
See  Pig.  2  (Ref.  Wright  and  Fletcher  1980) . 

The  lack  of  branch  instructions  is  an  advantage  from  the  point  of  view  of  software  testing  since  it 
dramatically  reduces  the  number  of  possible  context  dependent  failures.  It  is  also  of  benefit  in  the 
coding  of  linear  control  functions  since  every  instruction  is  executed  every  program  pass  and  hence  the 
iteration  rate  is  conotant,  independent  of  the  software  (equal  to  0.12  ms  for  the  2920-16  operating  at 
a  600  ne  cycle  time).  However,  combined  with  the  limited  instruction  Bet  and  program  size,  this  cast 
doubts  on  the  ability  to  uee  the  device  to  implement  complex  control  functions,  particularly  those 
involving  logarithmic  and  trigonometric  functions,  and  bo  several  exercises  were  undertaken  to  test  this. 

In  one  case  a  waveform  generation/ co-ordinate  transformation  task  wsb  programmed  on  one  processor,  including 
full  four  quadrant  sine/cosine  functions  (See  Pig.  3  and  Ref.  Wright,  1980).  In  a  second  exercise  the 
programming  of  aircraft  control  laws  was  investigated.  This  study  indicated  that  a  typical  longitudinal 
control  system,  including  gain  scheduling  and  special  high  incidence  control  laws,  could  be  implemented 
on  four  processors  (Ref.  Sharaz,  198h) .  Figure  I4  illustrates  a  representative  flight  control  law  element. 

Tho  detailed  hardware  characteristics  of  the  device  are  still  undergoing  extensive  testing  hut  it  appears 
that  if  suitable  precautions  are  taken  then  this  device  can  be  applied  in  the  short  term  to  a  number  of 
control  tasks. 

3.2.  Standard  computing  and  interface  modules 

In  order  to  use  this  device  in  a  range  of  applications  without  incurring  extensive  additional  hardware 
development,  a  pair  of  hardware  modules  were  developed,  one  performing  computing  and  one  the  interfacing 
function.  (See  Fig.  5)  Each  is  a  printed  circuit  card  carrying  %  number  of  eub-modules.  The  PC3  tracks 
carry  the  interconnections  always  required  in  any  application,  while  those  connections  needed  for  a 
specific  task  are  tided  by  component  selection  and  appropriate  wire  links.  The  design  for  a  computing 
card  incorporates  provision  for  up  to  eight  microprocessors.  All  the  analogue  input  and  output  lines  from 
each  processor,  together  with  a  substantial  proportion  of  the  edge  connection  lines  are  left  unocmmitted 
ready  for  suitable  wire  link  patching  to  suit  a  specific  task. 

It  is  less  obvious  how  the  interfacing  can  be  standardised.  However,  since  the  Intel  2920  has  its  own 
analogue  Input/Output  and  the  bulk  of  control  signals  will  be  analogue,  the  interfacing  requirement  reduces 
to  a  simple  one  of  buffering,  antialiasing  filter,  offset  and  gain  adjustment.  This  list  of  requirements 
can  be  accomplished  using  a  single  op-amp  circuit  which  can  also  act  as  the  summing  Junction  needed  to 
perform  the  resolution  enhancement  referred  to  previously .  Even  the  usually  complex  antiliaising  filter 
can  be  accomplished  with  a  simple  first  order  filter  because  the  very  high  sample  rate  relative  to  signal 
bandwidth  will  tolerate  the  shallow  cut  off  slope.  Thus  a  PC  design  comprising  eight  of  these  standard 
modules  on  a  card  ’  as  been  produced  and  can  be  modified  by  suitable  component  selection  to  suit  moat 
tasks.  The  exceptions  to  this  can  be  considered  as  special  cases,  and  for  this  purpose  an  unconmitted 
area  has  been  allowed  on  the  PC  card  to  cater  for  any  special  purpose  circuitry. 

U.  DOCUMENTATION  AND  TESTING 

The  foregoing  sections  have  shown  that  the  software  generation  task  can  be  reduoed  by  functionally  partition!* 
the  hardware  and  software,  and  how  this  might  be  achieved  in  practice.  However,  this  software  still  has  to 
be  free  of  errors  and  there  is  still  a  need  for  rigid  specification,  documentation,  test  and  aooeptanoe 
procedures.  It  is  interesting  to  note  the  parallels  between  the  computing  philosophy  outlined  above  and  a 
general  purpose  analogue  computer.  This  suggests  a  possible  documentation  and  teBt  philosophy  based  on 
conventional  analogue  practice  which  should  ensure  maximum  visibility  to  all  concerned  (see  Fig.  6). 

Following  design,  analysis  and  simulation,  control  laws  are  normally  specified  by  functional  block  diagrams. 
This  form  is  readily  converted  into  a  full  computer  specification  by  identifying  the  function  of  each 
processing  module  and  the  details  of  the  interconnections  both  between  modules  and  external  to  the  computer 
(including  analogue  signal  levels,  scale  factors  etc).  Each  module  functional  specification  can  then  be 
converted  into  a  program  and  thence  into  discrete  hardware.  The  documentation  of  this  module  would  oomprise 
the  program  listing  (fully  commented),  a  definition  of  the  scaling  and  truncation  of  all  intermediate 
variables,  and  a  definition  of  all  possible  context  dependencies. 

A  test  procedure  is  required  for  each  module,  independently  derived  from  the  module  functional  specification. 
This  will  be  a  hardware  functional  check  to  be  performed  on  the  processing  module  after  programming  by  \ 
stimulating  the  inputs  and  monitoring  the  outputs.  The  tests  will  exercise  all  inputs,  internal  variables 


7-4 


and  outputs  over  their  full  range  of  amplitude  and  '"requency  and  will  check  for  correct  operation  of  all 
conditional  instructions. 

The  test  procedure  serves  both  to  verify  the  software  and  to  check  that  the  processor  has  been  correctly 
programmed  and  is  functioning  correctly.  Since  the  check  out  is  fully  comprehens. ve  there  is  no  need 
for  separate  software  verification  procedures  based  on  emulations  or  other  computer  based  procedures. 

The  complete  computer  would  be  functionally  tested  in  a  similar  way,  testing  all  input/output  interfaces, 
all  communications  between  modules,  all  modes  of  operation  etc.  but  without  having  to  repeat  the  exhaustive 
software  checkout  since  no  module  can  affect  the  correct  functioning  of  any  other  module. 

This  reliance  on  functional  specification  and  teat  promises  to  reduce  the  magnitude  of  the  documentation 
task,  and  the  increased  visibility  of  the  testing  process  Bhould  give  improved  confidence  that  the  final 
product  will  operate  consistently  and  correctly. 

5.  CURRENT  APPLICATIONS 

Of  several  applications  being  pursued,  the  most  challenging  involves  the  Royal  Aircraft  Establishment's 
Fly-by-Wire  Hunter  aircraft  which  is  currently  being  operated  jointly  with  British  Aerospace,  Brough,  on 
a  flight  dynamics  research  programme.  This  aircraft  is  currently  fitted  with  a  quadruplex  analogue 
active  control  system.  It  is  hoped  that  by  gradually  introducing  a  number  of  processors  into  this  system 
the  multiple  asynchronous  microprocessor  concept  can  be  proved,  while  at  the  Bame  time  enhancing  the 
capability  (in  terms  of  flexibility  and  complexity  of  control  laws)  of  the  existing  system  and  it  is 
intended  that  a  number  of  the  processing  and  interfacing  cards  be  configured  as  a  duplex,  fail  passive, 
computer  which  can  be  used  for  ad  hoc  extentions  to  the  existing  control  law  computations.  Initially 
this  will  be  of  limited  authority  but,  as  confidence  is  gained  in  the  system,  more  comprehensive  control 
functions  can  be  added  with  wider  authority  until  all  of  the  present  analogue  control  law  implmentation 
has  been  replaced.  This  should  provide  a  considerable  increase  in  utility  of  this  aircraft  as  an 
experimental  vehicle,  and  should  expose  these  asynchronous  multiprocessor  concepts  to  a  realistic  test 
of  their  practicality  at  an  early  stage.  If  this  work  is  successful  it  is  then  anticipated  that  further 
extension  of  the  digital  computing  sections  could  allow  the  failure  management,  built-in  and  pre-flight 
test  functions  (currently  implmented  with  analogue  techniques)  to  be  updated  until  a  multiprocessor 
configuration  was  achieved  which  would  be  fully  representative  of  a  production  flight  control  computer 
configuration. 

In  addition  to  the  above  flight  control  applications  there  is  a  need  to  investigate  the  maintainability 
and  survivability  of  such  a  system  and  its  possible  application  to,  and  implications  on,  the  rest  of  the 
aircraft  systems.  With  this  in  mind  a  laboratory  breadboard  of  a  multiplex  system  is  being  developed  to 
study  the  flight  control  system  architecture  per  se  and  also  to  provide  a  means  of  emulating  a  flight 
control  system  to  study  the  interaction  with  other  systems.  This  work  is  therefore  closely  tied  to  other 
development  work  involving  avionic  and  hydraulic  systems  rigs. 

6.  FUTURE  DEVELOPMENTS 

If,  in  pursuing  these  research  tasks,  the  experimental  system  provides  the  flexibility  and  performance 
that  is  hoped  for,  then  the  next  step  would  be  to  develop  a  fully  engineered  version.  At  that  Btage 
the  choice  of  processing  module  would  be  reassessed  in  the  light  of  experience,  bearing  in  mind  the  less 
severe  constraints  on  hardware  development.  In  particular  the  choice  of  analogue  communication  between 
modules,  while  being  expedient  in  the  short  term,  is  inappropriate  for  an  engineered  system  since  it 
reintroduces  some  of  the  problems  of  analogue  systems:  it  is  sucoeptable  to  noise  pick  up,  gain  variation 
and  offset  problems,  and  can  have  significant  variation  of  characteristics  with  temperature.  A  possible 
alternative  is  to  use  a  small  dual  port  memory  as  sn  asynchronous  buffer  between  each  pair  of  processing 
modules.  There  would  then  be  more  freedom  of  choice  of  processing  module  and  this  could  make  other 
desirable  features  such  as  hardware  multiply  available.  This  type  of  development  would  result  in  a  chip 
set  rather  than  a  single  chip  processing  module.  This  could  conveniently  be  integrated  using  a  hybrid 
packaging  technique  to  retain  the  oirouit  design  advantages  of  a  simple  modular  structure. 

It  is  worth  noting  that  this  asynchronous  multiprocessor  concept  with  its  very  simple  communication 
structure  lends  itself  to  investigation  of  otl.jr  advanoed  flight  control  system  concepts.  In  particular, 
the  2920  Signal  Processor  with  its  analogue  interfaces  should  be  eminently  suitable  for  implementing 
hybrid  dissimilar  redundant  control  systems  where  a  very  simple  analogue  control  loop  is  augmented  by  an 
advanced  digital  controller  (such  as  that  suggested  by  GILL  F  1979).  Also  it  has  been  suggested  that 
asynchronous  multiprocessors  oan  be  organised  into  a  fault  tolerant  system  by  the  addition  of  suitable 
control  structures  (ref.  Segall  et  al  1979).  While  this  observation  was  aimed  at  general  purpose 
computing,  a  simple  variant  on  the  theme  could  allow  an  equivalent  philosophy  for  dedicated  control 
processing  to  be  developed.  The  aim  of  these  studies  would  be  to  reduce  the  level  of  redundancy  required 
in  order  to  achieve  a  high  integrity  control  scheme.  Both  sohemes  operate  by  accepting  degraded  operation 
of  non  eBBential  functions  following  a  failure.  If  the  level  of  redundancy  required  could  be  reduced, 
then  it  could  allow  the  considerable  benefits  of  active  control  techniques  to  be  applied  to  a  muoh  wider 
range  of  aircraft. 

7.  CONCLUSIONS 

Digital  computation  of  control  functions  using  a  multi  redundant  system  offers  considerable  benefits  over 
a  similar  analogue  system.  However,  it  introduces  some  difficulties  of  its  own,  particularly  l)  a  laok 
of  visibility  of  system  operation  which  complicates  testing,  2)  time  delays  and  synchronisation  problems 
which  complicate  the  control  law  design  and  the  coding,  3)  possible  ooourrenoe  of  obscure  context 
dependent  failures. 

A  multiprocessor  flight  control  computer  allows  the  software  task  to  be  partitioned  into  convenient 
modules  thus  easing  the  generation  and  testing  of  suitable  code.  It  allows  these  modules  to  run  in  parallel 
thus  reducing  time  delay  problems.  Asynchronous  communication  over  dedicated  links  provides  visibility  of 
operation  so  aiding  test  and  aoceptance  procedures.  Finally,  a  restricted  instruction  set  oan  substantially 


7-5 


reduce  the  number  and  type  of  possible  context  dependent  problems. 

Thus  the  task  of  developing  and  testing  flight  control  software  should  be  considerably  eaeed.  This  is 
particularly  important  during  the  development  phase  of  a  new  aircraft,  or  for  an  experimental  control 
law/flight  dynamics  research  aircraft.  The  disadvantages  of  this  approach  are  that  it  does  not  provide 
a  minimum  hardware  solution  and  it  does  not  lend  itself  to  high  order  matrix  computation.  These  factors 
are  probably  not  significant  given  the  rapidly  reducing  costs  of  hardware  and  the  control  techniques 
which  are  likely  to  be  used  in  aircraft  in  the  forseeable  future.  The  asynchronous  multiprocessor 
approach  may  even  introduce  hardware  benefits  by  developing  a  number  of  modules  which  can  be  readily 
configured  for  a  wide  range  of  high  reliability  control  applications. 

If  in  total  these  factors  reduce  the  software  task  to  a  level  which  can  be  supported  "in  house"  then  major 
improvements  should  be  possible  in  the  rate  at  which  results  can  be  achieved  from  and  improvements 
incorporated  into  a  f 1 ight  development  programme.  In  practice  these  potential  advantages  can  only  be  assessed 
on  the  basis  of  practical  experience  and  it  is  hoped  that,  the  research  programme  outlined  above  in  both 
ground  rig  ard  airborne  applications,  will  demonstrate  these. 

ACKNOWLEDGEMENTS 


The  authors  are  indebted  to  British  Aerospace  for  permission  to  present  this  paper,  and  to  the  Royal 

Aircraft  Establishment  for  their  support  and  encouragement  during  the  course  of  this  project.  The  views 

presented,  however,  are  entirely  their  own. 

REFERENCES 

Corney,  J.  M.  Aircraft  active  control  systems:  the  inner  loop.  RAeS  Spring 

Convention  "Aerospace  Electronics  in  the  Next  Two  Decades"  1979- 

Gill,  F.  Ideas  for  future  efficient  flight  control  systems.  RAE  Tech.  Memo. 

FS  256,  1979. 

Jackson,  K,  and  Simpson,  K.  T.  MASCOT  -  A  modular  approach  to  system  construction  operation  and 

test.  AGARD  CP  11+9  1971+. 

Segall,  Z.,  Yoeli,  M.,  Strosbourger,  E.  Parallel  fault  toleranr.  computation  structure.  Computers  and 

Digital  Techniques  Vol.  2  No.  2, 

Sharaz,  A.  Implementation  of  Flight  Control  System  constituents  using  an 

Analogue  Processor.  BAe  Tech.  Note  YED  69SI  1980. 

Wright,  3.  M.  and  Fletcher,  M.  A  technique  for  improving  the  effective  resolution  of  an  A  to  D 

converter.  BAe  Tech.  Note  YEL  6351  April  1980. 

Wright,  S.  M.  Implementation  of  log  hyperbolic,  trigonometric  and  exponential 

functions  on  an  Intel  2920  signal  processor.  BAe  Tech.  Note 
YED  6986  1980, 


7-6 


•  Real  Time  Digital  ProceMlng  of 
Analog  Signal! 

•  Nominal  Signal  Bandwidth!  from  DC 
to  lOKHz 

•  Digital  Pro  caning  Accuracy  and 
Stability 

•  Special  Purpoaa  Inatruction  Sat  for 
Signal  ProceMlng 

•  Twantyfive  Bit  Wide  Data  Word 

•  400  na  Instruction  Execution  Time 

•  Multiple  Analog  Inputs  (4)  and  Output!  (8) 

•  On-Chip  Sample  and  Hold  Circuit! 
and  D/A  Converter 

•  On-Chip  EPROM:  User  Programmable 
and  UV  Erasable 

•  On-Chip  Scratch  Pad  Memory  (40  Locations) 

•  Analog  and/or  TTL  Output 
Waveforms,  User  Selectable 

•  192  Program  Locations 


Fig. la  Summary  of  INTEL  2920  Processor  Features 


VREF  *5V  5V  5F  GRDD  UROA  Ml  M2 


•EXTERNAL  COMPONENTS 


DIAGRAM  COURTESY  OF  INTEL 


Fig.  1  b  Functional  Block  Diagram  of  INTEL  2920  Processor  Architecture 


SIGNAL 


NON  ENHANCED 
OUTPUT 


WAVEFORM 

GENERATOR 


0.1  Hz  0.25  V  P  to  P 


SAWTOOTH 

GENERATOR 


Wn  x  800  RADs 


LOW  PASS 
FILTER 


ENHANCEMENT 

ENHANCED 

SIGNAL 

OUTPUT 

WAVEFORM 

GENERATOR 

FUNCTION* 


[x]  n  0  0  1 

[vj  [o  sing  cosej 


Ae8(t-to) 

0 

0 


0 

Aa  Blt- to) 
C 


0  Cos  (W  x  t) 
0  Sin  (W  x  t) 
1  Cxt 


WHERE  x,  y  ARE  ORTHOGONAL  INPUTS  TO  THE  OSCILLOSCOPE 
A,  B,  C  ARE  REAL  CONSTANTS 


t  =  TIME 

lo=  INITIAL  TIME 

W  =  ANGULAR  VELOCITY  OF  HELIX  VECTOR  ROTATION  IN 
THE  ORIGINAL  x,  y  PLANE  (e=  0) 

0=  INPUT  ANGLE  USED  TO  ROTATE  THE  HELIX  IN  THE 
2  PLANE  NORMAL  TO  THE  ORIGINAL  x,  y  PLANE 


TYPICAL  DISPLAY 
REPETITION  RATE  25  Hz 


-a- so0 


a- 45° 


Fig,3  Example  of  Trig  Function  Implemented  on  INTEL  2920 


7-11 


CONTROL  REQUIREMENTS 


SIMULATION  CONI  ROL  LAW  DESIGN 


Fig.6  Design,  Test  and  Documentation  Process 


8-1 


FUNCTIONAL  VERSUS  COMMUNICATION  STRUCTURES  IN  MODERN  AVIONIC  SYSTEMS 


by 

K.  Brammer  and  A.  Weimann 
ESG  Elektroi.ik-Sy stem-Gesellschaft  mbH 
Postfach  800569 
D-8000  Muenchon  80 
W .  Germany 


SUMMARY 

In  the  early  design  stages,  an  avionic  system  ia  functionally  structured  into  subsystems, 
which  in  turn  are  broken  down  into  functional  units  (equipments).  With  conventional  tech¬ 
nologies  and  with  signal  wiring  connections  of  the  single  source,  single  drain  type,  the 
functional  structure,  which  is  of  the  hierarchical  type,  could  more  or  less  be  carried 
over  to  the  implementation  stage.  Especially  the  line  replaceable  units  comprising  an 
equipment  were  typically  wired  to  the  master  unit  of  the  equipment  which  in  turn  mainly 
communicated  with  the  master  unit  (e.g.  computer)  of  the  subsystem. 

In  recent  years  this  situation  has  been  changing  rapidly.  Current  technological  trends 
that  have  major  implications  on  avionic  system  structures  are: 

-  For  intrasystem  signal  transmission,  networks  of  wires  connecting  a  single  transmitter 
with  a  single  receiver  are  being  replaced  by  bus  systems  with  time  division  broadcast 
characteristics. 

-  Progress  in  data  processing  technology  renders  it  feasible  to  assign  digitally  per¬ 
formed  functions  to  much  lower  system  levels  than  before. 

-  In  aircraft  design,  control  configured  vehicle  (CCV)  technology  implies  the  substitu¬ 
tion  of  mechanical  means  for  flight  critical  functions,  such  as  basic  stabilization 
and  primary  flight  control  by  electronic  data  processing  and  transmitting  means.  This 
has  raised  unprecedented  requirements  on  reliability  and  survivability  of  avionic  ele¬ 
ments  and  intrasystam  communication. 

-  In  the  field  of  navigation  sensors,  mechanically  stabilized  units  like  inertial  plat¬ 
forms,  Doppler  radar  antennas,  flux  valves  etc.  are  replaced  by  strap  down  sensors, 
where  the  decoupling  of  sensed  information  from  the  aircraft’s  rotations  is  now  per¬ 
formed  by  electronic  data  processing. 

-  Scanning  of  directional  sensors,  e.g.  fire  control  radar,  ESM  or  ECM  antennas,  is  in¬ 
creasingly  performed  by  electronic  means. 

In  the  paper,  the  implications  of  the  accompanying  increase  in  functional  and  communica¬ 
tion  interfaces  on  avionic  system  structures  are  analyzed.  Especially  the  passage  from 
functional  design  to  implemented  communication  structure  of  the  airborne  electronic  sys¬ 
tem  is  scrutinized.  The  distributed  organisation  of  an  avionic  system,  the  realizaticn  of 
which  is  greatly  simplified  by  bus  type  intrasystem  signal  transmission,  is  compared  to 
the  conventional  hierarchical  system  organisation.  Advantages  and  drawbacks  of  both 
organisations  are  reviewed  especially  with  respect  to  interface  efficiency.,  cabling 
requirements  and  the  typical  topology  of  avionic  systems. 

The  topic  is  illustrated  by  the  structures  of  a  conventional  and  a  modern  avionic  system. 


1  .  INTRODUCTION 

The  paper  addresses  a  problem  which  has  arisen  in  avionic  system  design  due  to  technolo¬ 
gical  changes  in  intrasystem  communication.  In  the  past,  there  existed  a  great  degree  of 
correspondence  -  at  least  in  principle  -  between  the  process  of  functional  structuring 
of  an  avionic  system  in  the  design  stage  on  the  one  hand,  and  the  communication  structure 
within  the  system  on  the  other  hand.  Both  structures  were  essentially  of  the  hierarchical 
type. 

In  the  meantime  the  advent  of  new  concepts  and  technologies  has  brought  about  a  certain 
discrepancy  between  the  functional  design  of  the  system  and  the  implementation  of  intra¬ 
system  communication.  Whereas  the  former  continues  to  be  hierarchical,  the  latter  treats 
the  terminals  as  peers. 

It  seems  that  this  trend  has  been  produced  mainly  by  three  developments:  the  simplification 
of  cabling,  e.g.  by  the  use  of  bus  systems,  the  distribution  of  processing  to  equipments 
and  line  replaceable  units,  and  the  transfer  of  network  and  switching  concepts  from  tele¬ 
communications  to  computer  networks  and,  subsequently,  to  avionics  systems. 

In  this  paper  an  attempt  is  made  to  draw  a  partial  resumfe  of  the  former  clean  situation 
as  a  reference  and  to  discuss  the  new  mixed  situation  with  respect  to  this  background. 


8-2 


f 


j 


\ 

f 


F 


E 

i 


e 


2.  HIERARCHICAL  ASPECTS  IN  AVIONIC  SYSTEMS 

2.1  Avionic  System  Design  Principle 

The  basic  method  and  the  main  steps  of  avionic  system  design  have  become  fairly  well 
settled  and  generally  accepted.  Here  we  sum  up  the  major  features  as  a  starting  point 
for  the  subsequent  analysis. 

The  task  of  syu em  design  is  always  subject  to  the  relevant  general  constraints  such  as 
national  or  in) ^national  standards,  practices,  logistics  procedures  and  so  on.  These 
are  not  always  explicitly  listed  by  the  customer,  rather  their  knowledge  is  often  impli¬ 
citly  expected  to  be  part  of  the  professional  experience  of  the  designer. 

The  specification  of  the  avionic  system  requirements  is  the  basic  document  containing 
the  technical  points  of  reference  for  the  system  to  be  designed.  It  defines  the  task, 
the  functions,  the  performance  and  the  modes  of  the  system,  together  with  itB  technical 
boundary  conditions  (e.g.  given  constraints  regarding  weight),  the  physical  operating 
environment,  the  external  interfaces  (e.g.  communication,  power  supply,  man  machine 
interface)  and  the  availability  parameters. 

In  response  to  this  input,  the  designer  concei’  and  nominates  the  system  parts  which 
in  combination  are  potentially  able  to  fulfil,  the  requirements. 

The  interrelation  of  these  parts  is  then  manifested  by  the  design  of  the  system  archi¬ 
tecture  and  organisation,  i.e.  by  creating  the  structure  and  assigning  functional  res¬ 
ponsibilities  and  management  authorities  to  hardware  parts,  software  parts  and  the  operator. 

This  step  must  be  accompanied  by  the  definition  of  all  arising  interfaces  between  the 
system  parts.  Now  the  fulfilment  of  the  requirements  can  be  checked.  If  the  result  of 
this  cycle  is  positive,  one  is  able  to  specify  the  system  parts. 

Usually  the  decomposition  of  a  system  requirement  or  specification  into  a  set  of  partial 
specifications  is  not  done  in  a  single  cycle,  but  in  repeated  cycles  at  successively 
lower  system  levels. 

Although  in  reality  it  is  not  always  possible  to  follow  this  design  procedure  in  complete 
purity,  this  so-called  top  down  design  philosophy  has  become  widely  accepted  as  a  basic 
guideline. 

2.2  Functional  Architecture 

Figure  1  illustrates  the  top  down  design  process  and  the  resulting  functional  architecture 
of  the  avionic  system  (LAUBER,  1980). 

At  the  top  level  we  have  the  functional  description  of  the  overall  system.  At  level  2  the 
decomposition  into  functional  areas  or  subsystems  has  been  performed.  The  Intermediate 
level  between  levels  1  and  2  describes  the  interrelations  between  the  system  and  its 
subsystems  and  between  the  subsystems  among  each  other. 

The  next  cycle  leads  from  the  subsystem  level  to  the  level  of  functional  modules,  imple¬ 
mented  either  by  software  or  by  hardware,  i.e.  equipments. 

From  a  systems  engineering  point  of  view  it  is  necessary  to  proceed  until  the  level  of 
construction  modules,  at  least  in  case  of  hardware,  because  the  installation  and  power 
supply  of  all  black  boxes  or  line  replaceable  units  must  be  defined. 

The  breakdown  of  black  boxes  internally,  e.g.  into  circuit  boards,  is  usually  left  to 
the  equipment  manufacturer  and  is  of  no  concern  in  the  following  discussion. 

It  is  evident  from  Figure  1 ,  that  the  top  down  design  method  automatically  produces  a 
hierarchical  set  of  specifications  for  the  parts  of  the  avionic  system. 

2.3  Interface  Efficiency 

It  is  remarkable,  that  one  finds  much  agreement  on  the  top  down  procedure,  but  scarcely 
any  philosophical  or  useful  theoretical  justification  for  it.  The  feeling  exists  that 
it  is  an  economical  and  efficient  way  to  proceed. 

In  Fig.  2  this  point  is  confirmed  with  respect  to  the  maximum  number  of  potential  inter¬ 
faces  among  the  members  of  hierarchies  as  compared  to  peer  groups. 

As  a  reference  we  use  the  total  number  of  possible  mutual  interfaces  in  a  peer  group - 
This  number  R  is  obviously  equal  to  N(N-1)/2  where  N  is  the  number  of  members.  The  number 
of  interfaces  in  a  hierarchy  is  called  R„.  Dividing  R^  by  the  reference  number  R,  we  ob¬ 
tain  a  measure  of  interface  efficie  t;  ,  SRAMMER,  1981).  This  measure  is  plotted  in  Fig.  2 
as  a  function  of  the  number  of  members.,  N,  in  a  double  logarithmic  scale.  Two  parameters 
are  used  to  describe  the  hierarchy:  the  number  of  levels,  and  the  number  of  associates  to 
each  master.  For  simplicity  the  latter  parameter  is  kept  equal  for  all  masters,  regard¬ 
less  of  the  levels. 


i 

i 


I 

i 


i 

j 


i 


i 

» 


j 

I 

9 


jUM i 


8-3 


If  the  hierarchy  has  only  2  levels,  the  number  of  all  possible  mutual  interfaces  is  equal 
to  the  case  of  the  peer  groups  the  interface  efficiency  quotient  remains  at  one. 

In  all  other  cases  the  hierarchy  features  less  interfaces  than  the  peer  group.  The  inter¬ 
face  efficiency  improves  uniformly  and  markedly  along  with  the  growing  number  of  levels, 
with  the  shrinking  number  of  associates  and  with  the  total  number  of  members,  as  shown 
by  the  set  of  decreasing  lines. 

For  example,  consider  a  group  with  the  order  of  32  members.  In  the  peer  group  or  in  a  two- 
level  hierarchy  they  have  about  500  possible  interfaces.  The  same  number  of  members,  orga¬ 
nised  in  three  hierarchical  levels  with  5  associates  to  each  master,  have  only  20*  or  100 
possible  interfaces.  If  they  are  organised  in  5  levels  with  2  associates,  the  number  of 
interfaces  reduces  still  further  to  10*. 

The  efficiency  effect  is  clear  and  uniform.  It  is  the  more  marked,  the  larger  the  group 
of  members  is.  For  instance,  with  1000  members,  the  number  of  interfaces  in  hierarchies 
with  up  to  ten  associates  is  1*  and  less,  compared  to  the  unstructured  case.  Although  the 
hierarchy  has  weaknesses  in  other  respects,  its  interface  efficiency  can  be  judged  as  an 
advantage. 

Clearly  the  number  of  interfaces  is  a  measure  for  the  labour  involved  in  complete  system 
specification  down  to  component  level,  to  contract  negotiations,  acceptance  test  and  sys¬ 
tem  integration  activities . 

2.4  Classical  Communication  Structure 

In  the  classical  avionic  system,  the  implemented  communication  structure  basically  fol¬ 
lowed  the  hierarchical  system  organisation,  see  Fig.  3.  This  was  essentially  due  to: 

-  the  presence  of  a  single  central  computer  as  the  only  resource  for  digital  general  pur¬ 
pose  data  processing, 

-  the  prevalence  of  single  source-single  drain  data  and  signal  transmission  lines,  and 

-  the  co-use  of  tne  central  computer  as  a  central  message  switching  node  in  order  to 
allow  multi-user  interconnections  despite  the  absence  of  bus  technology  (CARRUTHERS , 
1979). 

For  example,  in  classical  avionic  systems,  the  subsystem  functions  are  centralised  in 
the  form  of  subprograms  within  the  main  computer.  These  subprograms  communicate  via 
dedicated  links  directly  with  the  associated  equipments.  In  Fig.  3  the  equipments  in  the 
upper  line  belong  to  the  navigation  subsystem,  the  first  four  equipments  in  the  lower 
line  belong  to  the  displays  and  controls  subsystem,  etc. 

The  wiring  shown  goes  between  the  central  computer  and  the  master  unit  of  each  equipment. 
Their  associated  line  replaceable  units  (black  boxes)  are  in  turn  wired  to  the  equipment 
master  unit. 

This  way  a  hierarchical  communication  network,  formed  by  point-to-point  links,  is 
realised,  reflecting  very  well  the  functional  specification  tree. 

Of  course,  also  here,  reality  is  not  as  pure  as  the  idea.  For  reasons  of  reliability, 
damage  resistance  and  speed  the  considered  system  has  numerous  additional  cross  connec¬ 
tions  which  were  skipped  here. 


3.  CURRENT  TRENDS  INFLUENCING  AVIONIC  SYSTEM  ARCHITECTURE 

Far  several  years  the  architecture  of  avionic  systems  has  been  changing.  Fig.  4  illus¬ 
trates  some  of  the  major  contributing  trends  and  their  interrelationships. 

3.1  Technology 

The  left  hand  side  of  Fig.  4  shows  examples  for  relevant  technological  advances.  In  the 
field  of  sensors  they  are  phased  arrays  and  strap  down  components,  in  the  area  of  intra¬ 
system  transmission  we  had  the  advent  of  high  reliability  electronic  links,  and  in  the 
processing  field,  high  speed  switching  elements  and  large  scale  integration  are  being 
introduced. 

3.2  Concepts  and  Equipments 

The  center  part  of  Fig.  4  presents  a  number  of  current  concepts  and  equipments  influen¬ 
cing  systems  architecture.  To  name  some  of  them,  we  have  for  instance 

-  Abstract  implementation  of  coordinate  frames 

-  Fly-by-wire 

-  Active  stabilization  and  electronic  control 

-  Multiplexed  transmission,  and  of  course 

-  Microprocessors  and  -computers. 


3 . 3  Impact  on  Systems 


In  the  context  of  this  paper,  the  given  factors  have  three  main  impacts  on  avionic  sys¬ 
tems,  shown  on  the  right  hand  side  of  Fig.  4.  They  are 

-  A  substantial  increase  in  signal  and  data  processing 

-  Time  division  multi-source  multi-sink  transmission 

-  Locally  distributed  computing. 

All  three  points  have  led  to  important  structural  changes:  On  the  one  hand,  distributed 
computing  allows  location  of  processing  functions  at  their  proper  level  and  frees  the 
designer  from  concentrating  them  artificially  in  one  single  computer,  see  e.g.  (SYRBE , 
1978),  (C1MSA,  1979)  or  (BRAMMER,  1980).  For  example,  the  navigation  subprogram  can  be 
removed  from  the  central  computer  and  allocated  to  a  navigation  subsystems  computer. 
This  type  of  distributed  computing  tends  to  spread  out  the  functional  hierarchy  more 
visibly  throughout  the  system  topology. 

On  the  other  hand,  distributed  data  processing  in  the  strict  sense  implies  not  only 
physical  dislocation  of  processing  functions  and  associated  hardware,  but  also  distri¬ 
bution  of  the  data  base  and  of  the  control  function  (ENSLOW,  1978),  (SCHERR,  1978). 

This  philosophy  tends  to  diminish  the  hierarchical  features  of  system  organisation. 
Furthermore,  the  transfer  of  network  and  switching  concepts  from  telecommunications  to 
computer  networks  (WECKER,  1979)  and  from  there  to  avionic  systems  gives  rise  to  peer¬ 
like  communication  procedures.  Finally,  multiple  access,  broadcast  type  transmission 
systems  render  economic  implementation  of  direct  all-to-all  communication  feasible. 


4.  COMMUNICATION  IN  AVIONIC  SYSTEMS 

4.1  Available  Communication  Structures 

The  communication  structures  available  for  avionic  systems  today  are  summarised  in 
Fig.  5.  Each  line  in  the  structures  represents  a  connecting  cable.  A  simplex  connection 
contains  one  basic  channel  of  the  type  shown  top  left,  consisting  of  a  transmitter,  dri¬ 
ver,  line  and  receiver.  A  full  duplex  connection  contains  two  such  channels  in  opposite 
directions.  ’  ’ "  duplex  connections  use  the  same  line  for  both  directions. 

Every  avails  structure  shown  allows  the  direct  or  indirect  communication  among  all 
participating  :its,  indicated  as  solid  dots  arranged  in  a  circle. 

Using  conventional  point-to-point  links,  usually  bit-3erlal  and  word-serial, 
one  obtains  first  the  classical  structures: 

-  Network  of  direct  all-to-all  connections 

-  the  star 

-  the  layered  star. 

The  bottom  part  of  Fig.  5  shows  the  newer  structures  using  links  with  broadcast  capa¬ 
bility: 

-  the  matrix  formed  by  a  set  of  single-source,  multiple-sink  channels,  e.g.  of  ARINC 
429  standard  ("DITS") 

-  the  multiple  access  bus,  e.g.  of  MIL  1553  standard  ("MUX"),  carrying  multiple-source, 
multiple-sink  traffic  in  both  directions  on  a  time  division  basis. 

4.2  Cable  Lengths 

Suppose  that  all  the  structures  shown  in  Fig.  5  are  implemented  with  links  of  the  same 
technological  state  of  the  art,  especially  with  the  same  serially  transmitted  data  bit 
rate.  Remember  further  that  all  structures  allow  messages  to  be  transmitted  from  each 
unit  to  any  other  unit.  Then,  the  main  advantage  of  the  bus  structure  above  all  the 
other  structures  is  the  minimum  cable  length.  This  is  evaluated  in  Fig.  6  and  compared 
to  the  cable  length  of  the  layered  star,  the  star  and  the  all-to-all  network  (BRAMMER, 
1981)  . 

For  simplicity  and  generality,  the  topology  of  participating  units  has  been  assumed 
here  as  a  uniform  distribution  at  the  points  of  a  square  raster  with  constant  raster 
width  in  both  orthogonal  directions. 

The  graph  shows  the  total  cable  length  necessary  to  allow  complete  communication  among 
all  units.  This  length  is  normalised  by  the  raster  width  and  plotted  against  the  total 
number  of  units  on  a  double  logarithmic  scale. 

For  instance,  for  20  units  our  model  yields  a  total  cable  length  of  450  times  the  ras¬ 
ter  width  for  the  all-to-all  network,  as  compared  to  1 9  for  the  bus.  The  cabling  effi¬ 
ciency  of  the  bus  gets  even  better  for  larger  numbers  of  units. 

Note  however,  that  the  star  and  especially  the  layered  star  are  doing  fairly  well  in 
this  respect,  too. 


8-5 


4.3  Topological  Considerations 

We  have  seen  that  from  a  functional  point  of  view  the  layered  star  structure  is  the  most 
natural.  In  Fig.  7  this  is  case  A,  shown  top  left  in  idealised  form.  In  this  example, 
there  are  two  subsystems:  One  constituted  by  round  units  in  the  upper  half  and  the  other 
consisting  of  square  units  in  the  lower  half.  Each  subsystem,  in  turn,  has  three  equip¬ 
ments.  Each  equipment  has  a  master  unit  and  three  associated  units. 

Nowadays,  we  can  assume  computing  functions  down  to  tne  equipment  level.  Then  this  topo¬ 
logy  represents  a  federated  computer  architecture,  where  distributed  computing  is  allo¬ 
cated  according  to  disjunct  topological  areas. 

However,  in  a  real  avionic  system  the  functional  and  topological  ordering  of  the  line 
replaceable  units  does  not  coincide  as  in  case  A,  but  is  mixed  up  as  in  case  B.  The 
cabling  pattern  then  no  longer  follows  the  layered  star,  but  is  better  characterised  as 
a  superposition  of  several  stars.  So  the  advantage  in  cable  length  of  the  layered  star 
cannot  be  realised. 

The  same  mixed  configuration  of  LRU's  as  in  case  B  is  shown  in  case  C  and  it  is  obvious 
that  from  the  cabling  point  of  view  the  bus  structure  is  not  affected  by  the  mixed  topo¬ 
logy  of  functional  units.  But  the  question  is,  whether  it  is  really  desirable  that  a 
single- level  bus  connects  all  units  down  to  LRU  level. 

4.4  Modern  Communication  Structure 

From  Fig.  8  which  represents  a  typical  interconnection  structure  of  a  modern  fighter 
avionics  system,  one  can  conclude  that  not  each  and  every  black  box  is  connected  to  a 
common  bus.  The  avionics  bus  -  duplex  for  redundancy  -  picks  up  the  subsystems  such  as 
navigation,  fire  control,  flight  control  and  some  equipments  that  have  many  communica¬ 
tion  interfaces  such  as  air  data,  multifunction  keyboards  and  displays. 

Thus,  the  present  state  of  the  art  in  avionic  systems  still  features  hierarchical  levels 
of  communication:  the  central  system  control,  the  subsystem  computers  and  some  equipments 
communicate  on  an  upper  level  bus,  while  in  the  lower  levels  either  dedicated  buses 
(triplex  for  flight  control)  or  even  still  star  type  cables  are  used. 


5  .  CONCLUDING  REMARKS 

5.1  Advantages  of  Multi-Level  Communications 

It  has  been  noted  that  a  common  single-level  bus  running  past  all  units  of  the  system  has 
the  minimum  possible  cable  length  of  all  communication  structures.  Nevertheless,  a  multi¬ 
level  structure  persists  due  to  the  following  advantages: 

-  Hierarchical  structuring  is  efficient,  not  only  in  the  design  process,  but  also  for 
contractual  specifications,  configuration  control,  acceptance  testing,  integration, 
maintenance  and  retrofit. 

-  This  efficiency  is  mainly  due  to  the  reduction  of  the  various  sorts  of  interfaces 
between  units,  especially  the  communications  interfaces. 

-  Generally  the  data  rate  decreases  when  we  pass  from  lower  to  higher  levels,  therefore 
transmission  capacity  problems  are  alleviated  by  layering. 

-  Vice  versa,  reliability  requirements  often  differ  among  subsystems,  giving  rise  to  dedi¬ 
cated  components  and  links . 

-  Functional  autonomy  of  equipments  is  maintained  if  they  have  dedicated  lines  to  their 
LRU's.  Otherwise,  equipment  development  and  acceptance  testing  would  be  greatly  com¬ 
plicated. 

These  points  call  for  at  least  two  levels  of  communication:  System  bus,  and  links  between 
the  LRU's  constituting  an  equipment.  An  intermediate  third  level  may  be  adequate  for  some 
subsystems  such  as  flight  control. 

5.2  Characteristics  of  Distributed  Processing 

Distributed  processing  has  become  cost-effective  and  is  increasing  in  avionic  systems. 

The  advantages  are 

-  The  hierarchical  decomposition  of  subsystem  functions  can  be  directly  implemented, 
yielding  a  set  of  smaller  programs  instead  of  one  large  central  program. 

-  Autonomy  of  subsystems  is  possible,  with  better  reliability  and  survivability  charac¬ 
teristics  . 

-  In  conjunction  with  the  use  of  communication  buses  the  central  computer  is  eliminated 
as  a  central  node  or  switching  element. 

-  Locally  dispersed  computing  resources  with  reconfiguration  capability  are  reducing 
vulnerability. 

However,  due  to  communication  delays,  the  interplay  of  distributed  algorithms  is  less 
deterministic  than  in  the  centralized  case  in  that  each  part  must  operate  without  a 
complete  instantaneous  knowledge  of  the  state  of  all  other  pe.rts. 


8-6 


Furthermore,  even  in  a  distributed  system  of  avionics  application  programs  it  is  neces¬ 
sary  that  their  functional  authority  and  the  validity  of  data  bases  be  system-wide  managed. 

5.3  Remaining  Problems  and  Outlook 

We  have  seen  that  at  the  present  state  of  the  art  we  live  with  an  -  at  least  partial  - 
discrepancy  between  functional  and  data  flow  structures  in  avionic  systems. 

Bus  systems  make  a  logical  connection  of  all-to-all  type  easily  feasible,  allowing  mul¬ 
tiple  use  of  sensors  for  improved  system  performance  and/or  distribution  of  processing 
resources  for  better  failure  or  damage  resistance.  But,  even  if  we  restrict  this  to 
equipment  level  and  above,  the  system  design  has  to  cope  with  a  substantial  growth  in 
communication  interfaces.  The  overlay  of  functions  versus  communications  must  be  subject 
to  careful  book-keeping,  timing  and  control.  This  problem  is  aggravated  by  dynamic  recon¬ 
figuration  capability  of  the  functional  system  architecture,  especially  when  time- 
critical,  high-priority  functions  require  a  high  degree  •  >f  confidence  to  be  served  at 
the  right  moment  without  delay. 

Regarding  avionic  system  operation  we  note  the  persistence  of  three  types  of  central  sys¬ 
tem  elements 

-  System  functions  synthesizing  top  level  applications  on  the  basis  of  subsystem  functions 

-  Control  of  distributed  data  processing 

-  Control  of  bus  transmissions 

These  elements  remain  critical  and  need  special  redundancy  protection  and  installation 
considerations . 

Summing  up  briefly,  it  might  be  suggested  that  for  avionic  systems  the  conflicting  goals 
of  deterministic  system  behaviour  requiring  few  functional  and  communication  interfaces 
and  tight  control  on  the  one  hand,  and  of  enhanced  availability  requiring  distribution 
of  resources,  reallocation  of  functions  and  many  communication  interfaces  on  the  other 
hand,  require  more  research  and  practical  experience  in  order  to  harmonise  them  and  to 
establish  new  adequate  and  generally  accepted  avionic  system  implementation  procedures. 


6 .  REFERENCES 

BRAMMER,  K.,  1980,  "Architecture  of  Flight  Guidance  and  Control  Systems,"  Working  Paper 
prepared  for  AGARD  GCP  WG  05,  Functional  Integration  of  Positioning  and  Guidance  and 
Control  Systems,  ESG,  Munich,  24  March  1980. 

BRAMMER,  K. ,  1981  "Functional  Interrelations  and  Communications  interconnections  in 
Avionic  System  Structures,"  Tech.  Rep.  ES-T/81,  ESG,  Munich,  30  April  1981. 

CARRUTHERS,  J.F.,  1979,  "SHINPADS  -  A  New  Ship  Integration  Concept,"  Naval  Engineers 
Journal,  April  1979,  pp.  155-163. 

CIMSA,  1979,  "On-Board  Computer  Systems:  Architecture,  Technology,  Support  Software," 
Topic  3  of  Vol.  V  (Data  Processing)  of  Initial  Technological  Studies  on  European 
Air  Traffic  Management,  European  Community,  Brussels,  Dec.  1979. 

ENSbOW,  P.H. ,  1978,  "What  is  a  Distributed  Data  Processing  System?”,  Computer,  Jan,  1978, 
pp.  13-21. 

DAUBER,  R.  (Ed.),  1980  "EinfUhrung  in  das  Entwurf s-unterstdtzende  ProzeB-Orientierte 
Spezif ikationssystem  EPOS  80,"  Inst.  f.  Regelungstechnik  u.  ProzeBautomatisierung, 

Univ.  Stuttgart. 

SCHERR,  A. L . ,  1978,  "Distributed  Data  Processing,"  IBM  Syst.  J.,  Vol.  17,  No.  4, 
pp.  324-342. 

SYRBE,  M.,  1978,  "Basic  Principles  of  Advanced  Process  Control  System  Structures  and  a 
Realisation  with  Optical-fibre- ."upled  Distributed  Microcomputers,"  Proc.  7th.  IFAC 
Congress  (Helsinki,  June  1978),  pp.  393-401,  Pergamon  Press. 


WECKER,  S.,  1979. 


'Computer  Network  Architectures, 


Computer,  Sept.  1979,  pp.  58-72. 


8-8 


Without  wiring  between  Equipments  and/or  LRU's 


Figure  3:  Classical  Avionics  Interconnection  Structure 


Technol.  Advances: 


Concepts  &  Equipments 


Impact  on  Systems: 


Figure  4: 


Current  Trends  in  Avionic  Systems 


Layered 


t 


9-1 


CONTINUOUS  PRCONPIOURATION  IN  A  MULTI-MICROPROCRSSOR 
PLIGHT  CONTROL  SYSTRM 

LT.  SCOTT  L.  MAHER  AND  CAPT.  STANLEY  J.  LARIMER 
Air  Force  Wright  Aeronautical  Laboratories 
Flight  Dynamics  Laboratory 
Wright-Patterson  AFB,  OH 
U.S.A. 


[ 

| 


r 


f 

i 


SUMMARY 

Recent  research  at  the  US  Air  Force  Wright  Aeronautical  Laboratories  (Flight  Dynamics 
Lab)  has  resulted  in  the  development  of  a  yroi  sing  microprocessor  based  flight  control 
system  design.  This  system  is  characterized  by  a  collection  of  cooperatively  autonomous 
distributed  microcomputers  interconnected  by  an  arbitrary  number  of  common  serial 
multiplex  busses.  Each  processor  in  the  system  independently  determines  its  assignments 
using  a  simple  algorithm  that  dynamically  redistributes  system  functions  from  processor 
to  processor  in  a  never-ending  process  of  reconfiguration.  This  approach  offers  several 
potential  benefits  in  terms  of  system  reliability,  and  the  architecture  in  general 
incorporates  many  state-of-the-art  features  which  promise  improved  system  throughput, 
expandability,  and  above  all,  ease  of  programming. 

The  Continuously  Reconfiguring  Multi-Microprocessor  Flight  Control  System  (CRM2FCS) 
represents  a  significant  data  point  in  multi-processor  control  system  research.  Promising 
ideas  from  a  variety  of  references  have  been  included  and  integrated  in  its  design.  Its 
laboratory  implementation  '/ill  provide  a  demonstration  of  the  extent  to  which  these  ideas 
may  improve  throughput,  reliability,  and  ease  of  programming  in  flight  control 
applications. 

1.  INTRODUCTION 

Before  beginning  a  detailed  discussion  of  the  Continuously  Reconfiguring 
Multi-Microprocessor  Flight  Control  System  (CRM2FCS)  it  is  desireable  to  briefly  d.ucuas 
the  design  goals  and  philosophy  which  lead  to  this  architecture.  The  original  objective 
of  this  in-house  effort  was  to  develop  an  Air  Force  understanding  of  and  capability  in 
the  area  of  multi-microprocesor  flight  control  systems.  It  was  determined  that  a  high 
risk-high  payoff  approach  could  be  taken  in  an  effort  to  advance  the  state-of-the-art 
while  achieving  the  primary  objective.  The  approach  taken  was  simply  to  make  a  trade  off 
between  low  cost  hardware  and  simplification  of  software  as  well  as  to  distribute  control 
to  its  extreme  in  an  effort  to  obtain  data  as  to  the  extent  to  which  the  potential 
advantages  of  such  a  system  could  be  achieved.  Other  goals  were  to  reduce  overall 
hardware,  software,  and  life  cycle  costs  of  flight  control  systems  while  maintaining  high 
reliability  and  fault  tolerance.  Design  considerations  also  included  expandability  for 
integrated  control  applications  and  reconfigurability  to  meet  future  self-healing 
requirements. 

The  concept  of  continuous  reconfiguration  is  developed  in  some  detail  in  this  paper.  An 
example  is  given  and  the  advantages  of  such  a  scheme  are  discussed  briefly.  Autonomous 
control  is  introduced  as  an  ideal  method  for  controlling  the  continuously  reconfiguring 
architecture.  The  requirements  of  a  continuously  reconfiguring  autonomously  controlled 
multi-processor  architecture  are  listed  and  a  novel  bus  contention  scheme  and  the  concept 
of  virtual  common  memory  are  put  forward  as  the  means  of  meeting  the  requirements. 
Methods  for  simplifying  software  programming  are  also  discussed  as  well  as  a  description 
of  a  software  simulation  of  the  CRM2FCS.  Finally  the  actual  laboratory  implementation  of 
the  architecture  and  the  testing  and  data  gathering  facility  to  support  the  architecture 
are  described. 

2.  THE .CONCEPT  OF  CONTINUOUS  RECONFIGURATION 

Continuous  reconfiguration  is  defined  as  a  scheme  whereby  the  tasks  to  be  performed  in  a 
multi-processor  system  are  dynamically  redistributed  among  all  functioning  processors  at 
or  near  the  minor  frame  cate  of  the  overall  system.  This  approach  allows  continuous  spare 
checkout,  latent  fault  protection,  and  elimination  of  failure  transients  due  to 
reconfiguration  delay.  By  treating  reconfiguration  as  the  norm  rather  than  the  exception, 
failures  can  be  handled  routinely  rather  than  as  emergencies,  resulting  in  predictable 
failure  mode  behavior.  Using  this  approach,  it  is  projected  that  the  need  for  unscheduled 
system  maintenance  may  be  greatly  reduced. 

2.1  Example  Of  Continuous  Reconfiguration 

An  example  of  what  is  meant  by  continuous  reconfiguration  is  shown  in  Figure  1.  A  system 
of  9  processors  is  shown  performing  6  different  tasks,  A  thru  F  during  three  consecutive 
time  frames.  During  the  first  time  frame  processor  1  is  doing  task  B,  processor  2  task  D, 
processor  3  is  a  spare,  and  so  on.  In  continuous  reconfiguration  the  tasks  are 
redistributed  among  the  processors  at  the  beginning  of  every  time  frame.  For  example,  in 
the  second  time  frame  ,  there  is  an  entirely  different  assignment  of  tasks  to  the 
processors.  This  reassignment  is  accomplished  by  having  all  of  the  processors  that  are 
currently  healthy  in  the  system  Compete  for  task  assignments.  If  a  processor  fails  during 
any  time  frame,  it  is  no  longer  able  to  compete  for  task  assignments.  In  Figure  1,  if 
processor  4  failed  during  the  second  time  frame,  then  during  the  next  frame,  it  would  not 
be  able  to  compete  for  task  assignment.  The  6  tasks  which  need  to  be  done  are  taken  by 
healthy  processors  and  the  2  remaining  processors  become  spares.  In  other  words,  a 


A 


9-2 


failed  processor  simply  disappears  !!rom  the  system  without  any  other  processors  being 
aware  that  it  is  gone. 


Fig.  1  Continuous  Reconfiguration 

2.2  Advantages  To  Continuous  Reconfiguration 

There  are  a  number  of  advantages  to  the  continuous  reconfiguration  approach.  One  of  these 
is  the  ability  to  have  continuous  spare  check-out.  In  traditionl  Bystems,  where  certain 
processors  are  permanently  assigned  to  the  spare  Btatus  until  they  are  needed,  it  is 
possible  for  one  of  these  processors  to  fail  while  functioning  as  a  spare.  When  a  system 
processor  fails  and  the  failed  spare  is  brought  on  line,  cataBtophic  results  may  occur. 
The  technique  of  continuously  switching  which  processors  are  acting  as  spares  allows 
every  processor  in  the  system  to  be  constantly  exercised.  If  a  processor  does  f.il,  it  is 
identified  quickly  and  removed  from  the  system,  before  it  can  cause  any  problems. 

Latent  fault  protection  is  another  advantage  of  the  continuous  reconfiguration  approach. 
Latent  faults  are  a  class  of  faults  that  are  characterized  by  the  partial  failure  of  a 
processor.  The  processor  failure  is  not  immediately  detectable  and  may  impede  the 
systems  ability  to  recover  from  any  subsequent  failures.  Continuously  exercising  each 
processor,  so  that  over  a  period  of  time  every  processor  performs  every  task,  forces  a 
partially  failed  processor  to  reveal  its  failure  and  be  removed  from  the  system  before  it 
can  interact  with  another  partially  failed  pro  essor  in  a  manner  that  may  preclude 
recovery. 

A  third  benefit  of  continuous  reconfiguration  is  zero  reconfiguration  delay.  Most  systems 
that  are  reconf igurable  treat  a  failure  as  an  emergency  requiring  special  processing. 
This  produces  delays  and  possible  failure  transients  in  bringing  the  system  back  to  its 
fully  operational  state.  With  continuous  reconfiguration  there  is  no  emergency.  The 
system  reconfigures  naturally  every  time  frame  so  that,  when  a  failure  occurs,  the  system 
takes  it  in  stride  and  with  no  failure  transient. 

2.3  Controlling  A  Continuously  Reconfiguring  System 

A  unique  approach  has  been  taken  to  controlling  the  continuously  reconfiguring 
multi-microprocessor  flight  control  system.  One  approach  would  be  to  have  a  central 
controller  in  charge  of  assigning  taskB,  handling  reconfiguration  and  controlling  bus 
access.  A  high  throughput  computer  would  be  needed  to  meet  the  overhead  requirements  of 
the  continuously  reconfiguring  architecture.  A  central  controller  also  introduces  the 
possibility  of  a  single  point  failure  in  the  system  requiring  redundancy  incomparable 
with  the  architecture  and  reducing  the  reliability  of  the  continuous  reconfiguration 
concept. 

An  alternative  approach  to  a  central  controller  is  autonomous  control.  This  is  a  scheme 
whereby  each  processor  independently  determines  its  own  next  task  based  upon  the  current 
aircraft  state,  inis  can  be  better  understood  by  using  an  analogy.  Like  the  traditional 
centrally  controlled  computer  architecture,  a  company  has  a  president  who  has  several 
vice-presidents  working  for  him.  The  president  has  access  to  all  information  concerning 
the  states  of  the  company  and  an  understanding  of  how  the  company  should  function.  He 
uses  this  knowledge  to  allocate  tasks  to  the  vice-presidents  and  arbitrate  any 
disagreements  that  may  arise  between  them.  Autonomous  control  is  analogous  to  replacing 
each  of  the  vice-presidents  with  a  clone  of  the  president.  The  vice-p  esidents  are  now 
capable  of  making  the  same  decisions  that  the  president  would  have  made  under  the  same 
circumstances,  since  they  have  access  to  the  data  that  he  had  and  would  go  through  the 
same  decision  making  process  that  he  would  .  The  need  for  the  president  has  been 
eliminated  and  he  has  been  replaced  by  autonomous  vice-presidents.  This  approach  is  not 
practical  in  the  human  world  because  no  two  humans  think  alike.  In  the  computer  world, 
however,  it  is  a  realizable  possibility. 


iMiL  _i 


9-3 


2.4  Requirements  Of  A  Continuously  Reconfiguring  System 

In  order  to  make  continuous  reconfiguration  of  autonomously  controlled  processors 
possible,  several  requirements  must  be  satisfied.  These  requirements  include  an  efficient 
bus  contention  scheme,  availability  of  system  state  information  to  all  processors, 
availability  of  all  software  to  every  processor,  and  a  well-defined  set  of  task 
assignment  rules.  The  methods  used  to  meet  each  of  these  requirements  in  the  laboratory 
implementation  are  covered  in  some  detail.  Considerable  attention  has  aloo  been  devoted 
to  techniques  for  simplifying  the  actual  software  design  for  use  in  thiB  system.  Such  a 
scheme  is  clearly  required  if  the  organization  of  a  large  number  of  processors, 
performing  complex  flight  control  algorithms,  is  to  be  implemented  without  total  chaos. 
The  two-dimensional  task  assignment  chart  is  introduced  to  simplify  this  process. 

The  first  requirement  is  for  a  set  of  well  defined  task  assignment  rules.  Each  of  the 
processors  must  have  an  efficient  means  of  determining  the  next  task  that  it  is  required 
to  do.  There  must  not  be  an  opportunity  for  any  processor  to  conflict  with  other 
processors  in  the  system  and  cause  system  failures.  The  task  assignment  rules  are  a 
function  of  the  operating  system  software  (Larimer,  S.J.,  JUNE,  1981)  and  are  discussed 
further  in  section  5. 

A  second  requirement  is  that  all  processors  must  have  all  software.  In  order  for  a 
processor  to  be  capable  of  doing  any  system  task  at  any  point  in  time,  it  must  have  the 
software  available  to  do  the  task.  This  may  seem  unrealistic  at  first  but  a  study  of  the 
trends  in  memory  technology  reveal  that  memory  will  continue  to  double  in  density  every 
year  to  year  and  a  half  for  at  least  five  years  and  that  the  cost  of  memory  will 
continue  to  go  down.  This  trend  makes  supplying  all  software  to  every  processor  a 
reasonable  trade  to  get  the  benefits  offered  by  the  CRM2FCS. 

A  third  requirement  of  this  system  is  that  all  processors  must  have  all  data.  A  processor 
must  be  capable  of  doing  any  task  at  any  point  in  time  and  in  order  to  perform  most  tasks 
must  have  access  to  data  concerning  the  present  state  of  the  aircraft.  This  requirement 
could  be  met  almost  ideally  by  the  common  memory  architecture  illustrated  in  Figure  2b. 
The  common  memory  is  accessed  equally  by  every  processor  in  the  system.  This  is  excellent 
from  a  software  standpoint,  since  the  programmer  can  treat  the  common  memory  as  though  it 
were  a  part  of  the  processor's  local  memory.  Simply  reading  variables  from  a  set  location 
and  writing  results  into  other  locations. 

Although  ideal  from  a  software  standpoint  it  is  very  poor  from  a  hardware  standpoint.  The 
number  of  processors  that  can  access  the  common  memory  is  limited  to  the  number  of  ports 
which  can  realistically  be  interfaced  to  it.  This  approach  also  introduces  complex  timing 
problems  when  more  than  one  processor  attempts  to  access  the  common  memory  at  the  same 
time. 

A  more  suitable  architecture  from  the  hardware  standpoint  is  the  common  bus  structure 
also  shown  in  Figure  2a.  The  processors  in  this  architecture  are  interconnected  by  a 
common  serial  bus.  The  number  of  processors  that  can  be  attached  to  this  bus  is  virtually 
unlimited  and  the  interface  hardware  is  relatively  simple.  This  is  a  poor  architecture 
from  a  software  standpoint,  however,  since  data  must  he  formatted  before  transmitting  it 
and  must  be  processed  as  it  is  received.  Ti.is  architecture  is  also  subject  to  bus 
contention  problems  when  more  than  one  processor  attempts  to  transmit  data  on  the  bus 
simultaneously.  The  fourth  requirement  is,  therefore,  that  an  efficient  bus  contention 
scheme  is  needed. 


Fig.  2  Evolution  of  Virtual  Memory 


Fig.  3  State  Information  Matrix 


SM 


3.  VIRTUAIi-  CflMMQH-MEMQRX 

An  architecture  which  meets  all  four  requirements  was  developed  in  the  Plight  Dynamics 
Laboratory,  it  is  a  combination  of  the  software  advantages  of  the  common  memory 
architecture  and  the  hardware  advantages  of  the  common  bus  architecture.  The  best  of 
these  two  archiectuces  form  the  basis  of  the  virtual  common  memory  architecture 
illustrated  in  Figure  2. 

One  of  the  key  advantages  of  the  virtual  common  memory  architecture  is  that  it  iB  a 
common  bus  architecture  which  looks  like  a  common  memory  architecture  to  the  software 
j  i  programmer.  In  this  architecture  each  microprocessor  simply  interacts  with  a  set  of 

f  information  in  the  virtual  common  memory  that  contains  all  necessary  information  about 

i  the  state  of  the  aircraft.  This  area  of  the  virtual  memory  is  called  the  state 

I  information  matrix  or  SIM. 

I  The  SIM  is  a  mathematical  abstraction  used  for  organizing  all  the  available  information 

[  about  the  state  and  environment  of  an  aircraft.  With  this  structure  all  microprocessor 

j  functions  can  be  broken  down  into  three  sets.  The  first  set  of  functions  takes  raw  sensor 

|  data,  the  F  functions  in  Figure  3,  process,  filter,  and  store  it  in  designated  locations 

within  the  SIM.  Another  set  of  functions,  the  H  functions  in  Figure  3,  take  information 
which  is  in  the  SIM,  process  it,  and  refine  it  to  produce  higher  quality  data.  This  could 
be,  for  example,  a  Kalman  Filter  algorithm.  This  refined  data  is  stored  back  in  the  SIM 
where  it  can  be  accessed  by  other  processors  in  the  system.  A  third  set  of  processor 
functions  take  information  from  the  SIM  and  processes  it  for  use  by  the  outside  world. 
These  are  the  G  functions  in  Figure  3  and  are  typically  control  laws  or  display 
algorithms.  With  the  SIM  structure,  all  software  programming  for  each  microprocessor  has 
been  reduced  to  a  simple  set  of  interactions  with  the  state  information  matrix. 

3.1  Implementation  Of  Virtual  Common  Memory 

The  implementation  of  the  virtual  common  memory  in  hardware  (shown  in  Figure  4)  utilizes 
the  simple  serial  bus  structure  described  earlier.  Each  unit  interfaced  to  the  serial 
bus  is  referred  to  as  a  processing  module.  A  processing  module  consists  of  a 
microprocessor,  local  memory,  transmitter,  receiver,  and  a  copy  of  the  state  information 
matrix.  Each  processing  module  independently  determines  which  task  it  must  do  next.  It 
accesses  variables  from  the  local  SIM  which  are  needed  to  do  a  computation.  When  the 
algorithm  has  been  completed,  the  data  and  its  location  in  the  SIM  are  placed  in  the 
processing  module's  transmitter  buffer.  The  transmitter  circuit  automatically  searches 
for  an  available  bus  and  transmits  the  information.  Every  processing  module  receiver, 
including  the  originating  processing  module,  receives  the  data.  Through  a  direct  memory 
access,  the  data  is  then  placed  in  the  proper  location  in  the  SIM  of  every  processing 
module.  Each  processing  module  maintains  an  identical  copy  of  the  SIM.  As  far  as  any 
processing  module  is  concerned,  the  SIM  appears  to  be  entirely  within  its  own  local 
memory.  Using  this  concept,  processors  connected  by  a  simple  serial  bus  appear  to  share 
one  common  memory  containing  all  information  in  the  system.  This  greatly  simplifies 
programming  by  reducing  interprocessor  communication  to  simple  reads  and  writes  on  a 
virtual  common  memory. 


(Fig.  4  CRM2FCS  Architecture  Elements 

4.  BUS  COMIEHTIQH 

The  virtual  memory  concept  requires  a  great  deal  of  information  transfer  and  required  a 
new  approach  to  bus  contention  which  would  allow  the  processors  to  compete  for  access  to 
a  serial  bus  without  the  need  for  a  central  controller.  The  bus  contention  scheme 
presented  greatly  increases  the  efficiency  of  bus  utilization  and  allows  improved 
bandwidth,  expandibility ,  and  reliability  over  other  conventional  approaches.  The 
technique  also  permits  simple  precise  scheduling  of  transmission  on  the  bus  to  virtually 
eliminate  the  effects  of  transmission  delay  in  the  system  (Larimer,  S.J.  and  Maher,  S.L., 
MAY,  1981)  . 

-  lAh  'ii  ..1 


i 


9-5 


Time  on  the  bus  1b  divided  into  a  series  of  consecutive  intervals  (slots)  that  are 
exactly  one  transmission  word  long,  32  to  46  bits,  depending  on  word  format-  At  the 
beginning  of  each  new  slot,  all  processors  with  something  to  transmit  compete  to  fill 
the  slot  with  a  word  of  data.  The  resulting  massive  bus  collision  is  then  resolved  using 
a  technique  called  "transparent  contention*.  Transparent  contention  is  a  scheme  which 
allows  collisions  to  occur  on  the  bus  in  a  manner  such  that  only  one  of  the  colliding 
messages  survives.  All  other  messages  are  automatically  suppressed  without  wasting  a  bit 
of  transmission  time  during  the  collision.  As  a  result,  the  slot  is  filled  with  one  and 
only  one  data  word  and  competition  moves  on  to  the  next  available  interval. 

In  order  to  insure  that  there  is  always  data  available  for  transmission,  each  processsor 
maintains  a  queue  of  words  to  be  transmitted.  As  each  new  piece  of  data  is  generated,  the 
processor  places  it  into  a  first-in-first-out  (FIFO)  buffer.  A  special  transmitter 
circuit  is  then  responsible  for  emptying  the  FIFO  onto  the  bus  by  competing  for  time 
slots  with  all  other  transmitters  in  the  system.  This  frees  the  processor  from 
transmission  considerations  and  ensures  a  constant  flow  of  data  onto  the  bus. 

The  essential  elements  of  the  bus  architectue  are  shown  in  Figure  6.  Three  processing 
modules  are  shown  interconnected  by  a  common  serial  bus  made  up  of  a  data  line  and  a 
clock  line.  Each  processing  module  consists  of  an  ordinary  microcomputer  with  two  I/O 
devices  including  a  broadcaster  (B)  and  a  receive  (R) .  These  devices  use  the  signal  on 
the  clock  bus  to  synchronize  data  transmission  and  reception.  "T*  in  the  figure  is  a  bus 
termination  circuit  which  generates  the  clock  signal,  terminates  the  c)ock  and  data 
busses,  monitors  the  busses  for  faults,  and  generates  synchronization  pulses  for  the 
processing  modules. 


Fig.  5  Essential  Architecture  Elements 


Fig.  6  Transmitter-Bus  Interface 


Access  is  granted  to  the  bus  on  a  first-come  first-serve  basis.  While  on«  transmitter  is 
actively  using  th?  bus,  a  logical  BUSY  signal  is  maintained  which  prevents  any  other 
transmitter  from  initiating  a  broadcast.  This  eliminates  many  conflicts,  but  the 
probability  is  high  that  more  than  one  transmitter  will  initiate  a  transmission  on  the 
same  clock  pulse.  When  this  happens,  some  other  method  is  required  to  resolve  the  bus 
contention  problem. 


The  solution  is  found  by  observing  exactly  what  happens  when  two  transmitters  try  to  put 
data  on  the  bus  at  the  same  time.  Figure  6  shows  each  transmitter  connected  to  the  bus  by 
an  open  collector  transistor  buffer.  When  the  transmitter  puts  a  "0"  on  the  bus,  the 
output  transistor  drives  the  bus  to  ground.  To  transmit  a  "1"  the  transistor  is  turned 
off,  allowing  the  bus  to  float  high,  because  of  the  pull-up  resistor.  As  long  as  no 
transistor  is  turned  on,  the  bus  will  remain  floating  at  a  logic  "1";  but  if  any  of  the 
transistors  turn  on,  the  bus  will  be  pulled  to  the  logic  "0"  state. 

The  net  result  is  that  logic  zeroes  have  an  inherent  priority  on  the  bus.  Because  a  "1" 
is  transmitted  by  releasing  the  bus  while  a  "0"  is  transmitted  by  actively  pulling  the 
bu3  low,  units  transmitting  zeroes  will  always  have  priority  over  those  sending  ones. 
This  fact  is  used  to  develop  an  effective  arbitration  scheme. 

The  key  to  this  scheme  is  that  every  transmitter  constantly  compares  what  it  is  trying  to 
put  on  the  bus  with  what  is  actually  there.  In  the  event  of  a  disagreement,  the 
transmitter  simply  stops  sending,  waits  for  the  bus  to  become  available  again,  and 
retransmits.  This  approach  works  because  when  any  two  processors  disagree,  only  one  of 
them  detects  the  disagreement  and  drops  off.  The  other  transmitter  does  not  detect  the 
difference,  because  of  the  logic  level  priority,  and  continues  its  transmission.  No  bus 
time  is  wasted  because  one  message  is  completed  without  interruption. 

This  concept  work*-  equally  well  for  any  number  of  transmitters  in  contention,  if  ten 
transmitters  start  simultaneously,  they  all  send  in  parallel  until  there  is  a 
disagreement.  Any  transmitter  attempting  to  send  a  one  will  then  drop  off  while  those 
transmitting  zeros  will  continue.  Eventually,  only  one  transmitter  is  left  and  it 
completes  its  transmission,  completely  unaware  that  it  has  been  contending  for  the  bus. 

4.1  The  Multi-Bus  Concept 

The  bus  structure  described  represents  a  very  simple  way  to  interconnect  a  large  number 
of  autonomous  processors  without  need  of  a  central  controller.  However,  a  single  bus 
system  of  any  kind  is  generally  unacceptable  from  a  reliability  standpoint.  At  the  very 
least,  some  form  of  redundancy  is  required  in  order  to  avoid  a  potential  single  point 


9-6 


i 

f 


l 

L 

\ 


i 


i 

i 


)• 

f 


\ 


I 


failure  in  the  system.  Also,  a  single  serial  bus  has  a  finite  bandwidth.  A  large  system 
of  processors  exchanging  massive  amounts  of  data  can  quickly  saturate  such  a  bus,  making 
further  system  expansion  impossible.  The  approach  proposed  in  this  paper  is  ideally 
suited  to  expansion  to  as  many  busses  as  are  needed  to  meet  the  reliability  and 
throughput  requirements  of  most  any  system  (Larimer,  S.J.  and  Maher,  S.M.,  MAY  1981). 


This  bus  design  has  tremendous  flexibility.  There  are  four  serial  busses  used  in  the 
in-house  program.  The  bus  handwit.h  of  the  system  is  exactly  four  times  that  of  a  single 
bus  and  can  be  expanded  still  further  with  additional  busses.  Reliability  is  also 
advanced.  Selection  of  an  alternate  bus  in  the  event  of  a  failure  is  instantaneous  and 
automatic  because  processor  to  bus  connections  are  continuously  reconfiguring. 


5.  -"ASK  ASSIGNMENT  RULES 


Another  major  outgrowth  of  this  research  has  been  the  development  of  a  method  for 
programming  a  multiprocessor  system  (Larimer,  S.J.  JUNK, 1981) .  Programming  a  system 
consisting  of  a  large  number  of  processors  can  become  a  formidable  task.  Figure  7a  shows 
how  four  different  processors  might  be  programmed  in  a  multiprocessor  system.  As  each 
processor  completes  a  task  it  goes  on  to  the  next  one  immediately.  This  approach  is  very 
difficult  to  synchronize.  For  example,  processor  2  does  task  A  while  processor  4  does 
task  B  and  processor  3  does  task  0  which  combines  the  results  of  tasks  A  and  B.  If  task  B 
is  not  completed  before  task  C  is  started,  then  task  C  will  not  have  the  information 
n-  jded  to  complete  its  calculations.  This  possibility  can  greatly  increase  the  complexity 
f  the  software.  A  second  problem  with  this  programming  technique  is  that  it  is  very 
difficult  to  modify.  If  a  block  of  software  requires  r<  -ri,  ,q  o  t  new  algorithm  must  be 
added,  the  timing  of  the  software  will  be  channel.  Kvncnronii,' t  ion  must  be  maintained 
between  certain  tasks  and  guaranteeing  the  synchi  lization  requires  revalidation  of  all 
software.  One  small  change  in  the  so  tv, at-  will  therefore  influence  t  e  software 
validation  of  the  entire  system. 


6.1  C"  .»ntize  i  £  ftwaie 


1  t'toqramming  method  us<  in  thf 
igure  7b  .  Every  task  3  giv- 
length  ol  tha  Las!..  Tn  this  parti 
and  is  referred  to  as  a  millim 
lone.  For  example,  a  task  which  . 
allocated  '  w--  '\  mplete  miilimoi. 

performed  during  any  given  inrerva 
maintained.  Data  is  excht  jed  on 
availability  o'  data  or  subses  ue 
softwa  e  modu) as  arc  che  same  size 


:Rh2FC.s  is  called  the  quantized-sof  tware  approach 
an  int  g«r  number  ol  :ime  intervals  depending  upon  the 
ir  sxatem,  every  inveival  is  one  millisecond  long 
* .  Ever  task  i;  some  integer  r  imber  of  millimodules 
normally  be  e  eouted  in  1.5  milliseconds  would  be 
s.  This  allies  control  wver  which  tasks  are  being 
f  time,  so  sttict  synchronization  of  tasks  can  be 
ly  on  boundaries  between  millimodules.  As  a  result,  the 
nt  t <  3 k s  is  known  during  any  millimodule.  Since 
thi  y  can  be  easily  interchanged. 


The  quantized-sof tware  a?p:  uact  obviously  sacrifices  sobs  throughput,  for  example  a  1.5 
mill’  icond  cask  now  -aksf  two  tilhisec mds,  and  therefore  is  less  efficient  than  the 
cont  iuous  ;  oftwarr  method  however,  tne  sacrifice  in  throughput  is  well  justified  in 
view  if  the  added  uoftw  ,re  simplici  -y  and  flexibility.  Additional  throughput  can  be  added 
by  s  mply  a  ,diug  n  .re  processing  m  dules  while  maintaining  the  software  simplicity  and 
ile>  ibilit  ,  . 


5.2  Re^  <:  uf  igur  ition 


The  reco.  figuration  of  the  CRJC’CS  is  once  every  ten  milliseconds.  This  rate  is 

arbitrary  and  could  be  aegusted  to  a  slower  rate  if  data  gathered  from  tie  laboratory 
i.i  lemenr  tirn  indicates  the  rate  is  unnecessarily  high.  Figure  8  shot's  how 

r- conf ig  rati  or.  fits  into  the  software  scheme.  A  processing  module  health  status  table 
is  main  aiied  in  ne  SIM.  At  the  beginning  oi  every  major  frame  the  status  table  is 
"zeroed  cut',  as  ir  '.ask  a  oi  Figure  8b. 


f 


j 

J 


i>-7 


ul 

1 

2 

3 

4 
3 
« 
r 
8 
9 


O 
1 
1 
O 
1_ 
O 
0 
«1  1 


-  b.  Mm 


Fig.  8  Volunteering  and  the  Volunteering  Status  Table 

Each  processing  module  must  then  perform  a  self-check  to  determine  its  own  health,  task  B 
in  Figure  8b.  The  processing  module  then  broadcasts  its  health  status  which  becomes 
available  in  the  SIM  processor  status  table.  A  processing  module  can  then  determine  the 
task  set  it  will  be  required  to  do  during  the  next  major  frame,  task  C  in  Figure  8b.  To 
do  this,  each  processing  module  accesses  a  specific  variable  in  the  SIM.  The  random 
nature  of  the  variable  is  used  to  generate  an  offset  pointer  which  every  processing 
module  uses  to  determine  its  starting  point  in  the  SIM  processor  status  table.  In  Figure 
8a,  for  example,  the  random  offset  pointer  is  pointing  at  processing  modt-e  7, 
Processing  module  7  will  therefore  do  the  h.'ghe't  priority  task  set  during  the  next  major 
frame.  Processing  modules  8  and  9  have  *0"  status  indicating  they  are  unavailable  for 
task  assignment.  Processing  module  ten  will  determine  that  it  must  do  the  second  highest 
priority  task  set  since  8  and  9  are  unavailable.  Similarly  processing  modules  1  and  2 
will  do  task  sets  3  am!  4  and  processing  modules  4  and  5  will  do  task  sets  5  and  6.  This 
process  is  repeated  every  major  frame  so  that  task  sets  are  randomly  distributed  among 
functioning  processors. 


Fig.  9  A  Generic  Task  Assignment  Chart 
5.3  Task  Assignment  Chart 

The  task  assignment  chart  is  used  to  organize  all  of  the  one  millisecond  software  modules 
(millimodules)  to  be  used  in  the  system  (Larimer,  S.J.,  May  1981).  Figure  9  illustrates 
how  the  task  assignment  chart  is  organized.  The  vertical  axis  represents  the  number  of 
processors  in  the  system.  The  horizontal  axis  is  divided  into  one  millimodule  time 
increments  (millif rames) .  Ten  milliframes  form  a  minor  frame  and  3  minor  frames  complete 
a  major  frame.  Every  task  is  performed  at  least  once  duriny  every  major  frame  .  To  use 
the  chart,  the  progammer  first  divides  a  function  into  a  group  of  subfunctions  each  of 
which  requires  at  most  one  millisecond  to  execute.  Each  of  these  subfunctions  is  then 
designated  as  a  millimodule  and  placed  in  a  convenient  location  in  the  task  assignment 
chart.  In  Figure  9,  function  F(fl,  f2,  f3,  f4)  executes  in  four  consecutive  time 
intervals  beginning  with  milliframe  1.  Function  G(gl,  g2,  g3,  g4,  g5)  executes  entirely 
in  parallel  requiring  five  processors  and  only  one  milliframe.  Function  H(hl,  h2,  h3. 


T 


9*8 


i 

t 


f 


f 


t 


and  h4)  first  generates  intermediate  results  in  parallel  and  then  combiner?  them  in 
milliframe  6.  Various  iteration  rates  may  be  achieved  by  assigning  the  same  function 
several  times  in  the  same  chart  as  shown  for  function  K(kl) .The  task  assignment  chart  is 
used  to  easily  distribute  tasks  among  available  processing  modules. 

5.4  Task  Assignment  Compiler 

The  task  assignment  compiler  is  currently  under  development  at  the  Flight  Dynamics 
Laboratory.  It  is  an  automated  method  for  generating  the  task  assignment  chart.  The 
task  assignment  chart  rapidly  becomes  difficult  to  work  with  as  the  number  of  processing 
modules  increases  and  the  number  and  variations  in  rate  of  tasks  increases.  Data 
concerning  each  of  the  millimodules  is  input  to  the  task  assignment  compiler  and  a 
complete  task  assignment  chart  and  data  file  for  the  processing  modules  is  generated. 

Millimodules  are  given  an  identification  number  or  name  when  they  are  written.  The 
millimodule  identification,  required  repetition  rate,  and  data  I/O  requirements  are  input 
to  the  task  assignment  compiler.  The  compiler  automatical ly  rearranges  millimodules  to 
make  room  for  new  millimodules  and  indicates  to  the  us>  r  whether  additional  processing 
modules  will  be  required  to  accomplish  all  ta'^a.  This  method  of  simplifying  the 
software  development  will  further  reduce  the  workloau  for  the  programmer. 

5.5  Software  Simulation 

A  software  simulation  of  the  CRM2FCS  hardware  and  software  is  also  being  developed  at  the 
Flight  Dynamics  Laboratory.  Tbe  simulation  out)>ut  will  be  compared  to  results  obtained 
from  the  laboratory  system.  Discrepancies  between  the  simulation  and  laboratory  system 
will  be  analyzed  and  improvements  made  to  the  simulation  or  laboratory  system  as  required 
until  the  simulation  can  be  verifyed  as  accurately  representing  the  laboratory  system. 
The  software  simulation  can  then  be  used  to  predict  the  effects  of  changes  to  the 
baseline  system  without  having  to  make  changes  to  the  system  itself.  The  effects  on 
throughput  or  bus  utilization  of  adding  more  processing  modules,  using  a  different 
microprocessor,  or  changing  the  transmitter- receiver  hardware  can  be  studied.  The 
software  simulation  is  expected  to  be  a  valuable  tool  for  analyzing  advanced 
configurations  of  the  CRM2FCS. 

6.  LABORATORY  IMPLEMEHTAl  IQH 

An  effort  is  under  way  at  the  Flight  Dynamics  Laboratory  to  demonstrate  the  CRM2FCS 
concepts.  Data  gathered  from  this  in-house  program  will  be  used  to  quantify  the  extent  to 
which  expected  benefits  and  limitations  of  the  architecture  are  met.  A  validated  software 
simulation  of  the  system  will  then  be  used  to  project  throughput,  fault  tolerence,  and 
other  quantifiable  characteristics  of  modifications  to  the  baseline  hardware. 

The  in-house  facility,  shown  in  Figure  10,  has  been  designed  to  maximize  data  gathering, 
data  reduction  and  programmability  of  the  system.  The  basic  CRM2FCS  architecture  is 
represented  by  the  six  processing  modules  and  bus  termination  circuit  shown  in  the 
figure.  The  remaining  blocks  represent  interfaces  to  an  aircraft  simulator.,  cockpit  CRT 
display,  data  gathering,  data  reduction,  and  software  development  facilities. 


1  \ 


9-9 


A  processing  module  consists  of  a  16-bit  microcomputer,  8  Kwords  of  memory,  and  custom 
engineered  transmitter,  receiver,  and  state  information  matrix  (SIM) .  At  this  writing  a 
processing  module  has  been  successfully  implemented  in  the  laboratory.  The  custom 
circuitry  uses  small  and  medium  scale  integrated  circuits.  A  future  effort  could  put  the 
circuitry  in  a  single  large  scale  integrated  circuit. 

The  block  labeled  "68000"  is  a  state  of  the  art  16-bit  microcomputer  which  will  be  used 
for  a  single  axis  digital  aircraft  simulation.  It  is  interfaced  through  a  dedicated 
processing  module  to  demonstrate  one  method  of  accessing  external  system  components  such 
as  sensors  and  actuators.  A  follow-on  effort  will  use  an  analog  computer  to  do  more 
complex  aircraft  simulations. 

The  block  marked  "8002"  is  a  Tektronix  microprocessor  development  system.  It  is  used  for 
both  hardware  and  software  development.  It  has  a  direct  hardware  interface  to  a 
processing  module's  microprocessor.  The  "8002"  can  download  software  to  the  processing 
modules  prior  to  a  simulation  run.  After  the  simulaton  the  "8002"  is  used  to  make 
software  modifications  based  on  data  gathered  during  the  simulation.  The  new  software  can 
then  be  rapidly  downloaded  and  the  system  brought  up  for  another  run. 

A  Radio  Shack  TRS-80  is  used  in  conjunction  with  a  dedicated  processing  module  and  custom 
serial  bus  interface  to  gather  data  during  a  simulation  run.  The  processing  module  is 
used  to  monitor  the  history  of  specific  variables  in  the  SIM.  The  serial  bus  interface  is 
used  to  gather  raw  data  from  each  of  the  four  serial  busses.  The  TRS-80  then  processes 
the  data  to  pinpoint  specific  problems  and  to  determine  bus  utilization  and  system 
throughput.  The  TRS-80  also  controls  the  RS-232  switching  circuit.  This  circuit  allows 
data  and  software  to  be  easily  transferred  between  the  major  components  of  the  test 
system. 

The  real-time  display  controller  is  a  microprocessor-based  color  graphics  display  which 
can  be  configured  as  a  cockpit  instrumentation  display  or  be  used  to  monitor  the  system 
status  real  time.  The  display  controller  also  has  a  joy  stick  input  which  can  be  used  in 
more  advanced  aircraft  simulations.  The  real  time  display  controller  also  demonstrates 
the  ease  with  whch  the  architecture  can  be  interfaced  to  other  aircraft  subsystems. 

The  Tektronix  4081  is  a  stand  alone  minicomputer  with  graphics  capability  and  a  link  to  a 
main  frame  computer.  It  is  used  for  further  data  reduction  and  display  and  for  the 
development  of  complex  software  for  the  millimodule  compiler  and  software  simulation. 

7.  CONCLUSION 

There  are  three  major  potential  benefits  to  designing  a  flight  control  system  using  the 
methods  described  in  this  paper.  The  first  is  simply  expandability  as  system  needB  grow. 
It  is  a  well  known  fact  that  from  the  time  the  first  model  of  a  particular  aircraft  rolls 
off  the  assembly  line  until  the  last  one  lines  up  in  mothballs,  there  are  inumerable 
changep  that  occur  to  the  system.  This  causes  excessive  increases  in  cost  due  to  the 
difficulties  of  changing  hardware  and  adding  new  software  to  the  system.  The  CRM2FCS 
approach  has  the  potential  to  greatly  reduce  these  costs.  Modularity  of  both  hardware  and 
software  allows  considerably  easier  expandability. 

A  second  potential  benefit  is  the  ability  to  reduce  software  costs  which  are  the  single 
biggest  co3t  in  digital  systems  today.  By  designing  an  architecture  that  is  inherently 
easier  to  program,  the  cost  of  programming,  maintaining,  and  updating  software  can  be 
greatly  reduced.  This  contributes  to  a  reduction  in  life  cycle  costs. 

The  third  potential  benefit  is  the  possibility  of  greatly  reducing  unscheduled 
maintenence.  With  the  present  redundant  flight  control  computers,  if  any  component  of  the 
computer  has  failed  the  aircraft  is  not  allowed  to  take  off.  As  digital  technology 
progresses,  it  will  become  practical  to  configure  the  CRM2FCS  with  as  many  as  one  hundred 
processors.  If  only  40  processors  are  required  to  accomplish  the  necessary  processing 
there  will  be  60  spare  processors.  A  requirement  that  20  spares  be  available  before  the 
aircraft  takes  off  leaves  40  processors  that  can  fail  oefora  the  aircraft  is  grounded. 
When  scheduled  maintenence  occurs,  any  failed  processors  can  be  replaced.  Since  it  is 
unlikely  40  processors  will  fail  between  maintenance  periods,  the  goal  of  no  unscheduled 
maintenance  can  be  closely  approached. 


REFERENCES 

Larimer,  S.  J.,  June  1981,  "Managing  Software  in  a  Continuously  Reconfiguring 
Multi-Microprocessor  System",  Proceedings  of  the  1981  Joint  Automatic  Control 
Conference. 

Larimer,  S.J.  and  Maher,  S.L.,  May  1981,  "A  Solution  to  Bus  Contention  in  a  System  of 
Autonomous  Microprocessors",  Proceedings  of  the  IEEE  1981  NAECON  Symposium. 


10-1 


EXPERIENCES  WITH  THE  EXPERIMENTAL 


FFM  -  MCS 


Hermann  v.  Issendorff 

FGAN  -  FFM 
Kbnlgstr.  2 

5307  Wachtberg-Werthhoven 
Germany 


SUMMARY 


The  FFM-Mul tlcomputersystem  was  built  up  to  Investigate  the  utilization  of  microprocessor 
based  computing  networks  for  the  various  requirements  of  embedded  data  processing  and 
control  In  military  systems.  Being  designed  as  an  adaptable  building  block  system  the 

FFK-MCS  Is  serving  as  a  testbed  for  research  on  distributed  data  processing.  The  paper 
deals  with  a  general  method  for  the  design  of  process-networks  followed  by  the  adaption 
of  adequate  hardware-networks.  Several  types  of  messages  are  Introduced  for  efficient 
and  safe  communication  between  autonomous  process-modules.  Finally  some  Improvements 
of  the  hardware  building  block  system  are  presented. 

1  .  INTRODUCTION 

Distributed  systems  seem  to  be  particularly  well  suited  for  embedded  data  processing 

and  control  In  military  applications  like  airborne  systems.  Apparently  ♦  .'re  are  numer¬ 

ous  advantages  which  make  distributed  systems  preferable  to  conventional  monolithic 
systems . 

Some  attributes  seem  to  be  of  particular  importance  to  airborne  systems: 

.  Distributed  systems  can  be  designed  as  a  network  of  autonomous  computers.  A  network 
of  this  kind  constitutes  a  good  base  for  the  construction  of  fault  tolerant,  fall 
soft  and  damage  resistant  architectures. 

.  Distributed  systems  can  be  built  up  with  only  a  few  types  of  different  components 
which  may  even  be  massproduced  commercial  products.  This  may  result  In  easier  main¬ 
tenance  and  repair  as  well  as  In  lower  hardware  costs. 

.  Last  not  least  there  are  indications  that  software  production,  l.e.  programming  and 
testing  could  become  much  easier  than  with  conventional  data  processing  systems.  The 
same  holds  for  subsequent  extensions  and  changes  of  the  network.  Hence  there  Is  a 

good  chance  that  life-costs  of  distributed  systems  can  become  considerably  lower. 

On  the  other  side  the  science  of  distributed  systems  is  still  In  an  Infant  state.  Even 
the  term  distributed  system  Is  not  rigidly  defined  yet.  There  Is  no  profound  knowledge 
how  to  control  a  network  of  autonomous  nodes.  The  data  transfer  In  a  distributed 
system  of  this  kind  will  be  greatly  Increased  compared  to  monolithic  systems.  It  Is 
not  clear  so  far  if  this  problem  can  be  sufficiently  solved  or  If  It  presents  a  severe 
restriction  to  the  utilization  of  distributed  systems.  There  Isn't  any  design  methodo¬ 
logy  available  and  there  exists  no  language  which  supports  programming  of  autonomous 
processes  and  their  communication.  The  promising  aspects  on  the  one  side  and  the 
unsolved  or  even  undetected  problems  on  the  other  side  gave  rise  to  a  long  term  research 
project  at  the  FFM  In  Werthhoven.  Some  of  the  main  results  of  this  work  are  presented  In 
this  pacer. 


10-2 


It  begins  with  a  description  of  the  general  approach  to  decompose  the  data  processing 
task  of  a  given  application,  the  formation  of  a  network  of  process  modules  and  the 
adaption  of  a  hardware  network  to  the  network  of  process  modules.  Paragraph  3  describes 
the  hardware  building  block  approach  and  the  experimental  FFM-MCS  (Mul tlcomputersystem) . 
Paragraph  4  deals  with  the  Important  subject  of  communication  especially  with  the  def¬ 
inition  of  messages  and  simultaneous  message  execution.  Improvements  of  the  hardware 
building  block  system  which  Increase  flexibility  and  decentral Izatlon  will  finally  be 
discussed  In  paragraph  5. 

2.  THE  DISTRIBUTED  SYSTEM  DESIGN  APPROACH 


The  first  step  towards  a  distributed  system  consists  of  a  decomposition  of  a  data  process¬ 
ing  task  Into  a  set  of  ^unctions  which  are  Interrelated  by  their  Input  and  output  para¬ 
meters.  This  first  step  resembles  very  much  that  of  Mascot  (Mascot  Suppliers  Ass. ,1980). 
But  while  our  functions  are  nearly  Identical  with  the  activities  of  Mascot,  we  do  not 
Introduce  IDA's  (Intercommunication  data  areas).  A  function  Is  not  a  welldeflned  ob¬ 
ject.  It  Is  likely  that  this  object  has  a  minimal  Interface  In  relation  to  Its  com¬ 
plexity  but  that  may  not  be  true.  A  function  may  be  further  partitioned,  as  vice  versa 
adjacent  functions  may  be  combined  to  one  function.  The  final  size  of  a  function  will 
be  confined  later  by  the  features  of  the  hardware  network.  An  example  of  decomposition 
Is  shown  In  figure  1.  Simple  as  It  Is  It  shows  already  how  a  processing  task  can  be 
executed  by  macropipelining  to  increase  the  throughput  (Handler,  W.,  1973).  In  general 
a  data  processing  task  will  be  decomposed  In  many  more  functions  resulting  In  a  more 
complex  network.  Functions  In  separate  paths  are  Independent  and  hence  may  be  executed 
In  parallel. 


iiiMeii  flaafleulig 

•  Autonomous  processotte  portion  al  a  oven  DP- problem 

•  Sequentcl  program 

•  Supplied  with  oil  resources  durng  runtime 

•  Unambiguous  name 


Process  Module  Structure 


Head:  Home, Priority, Size 

Butter  Sbte.OSkemetlntormation.lelegrams 

Data  and  Workng  Space 


Program  and  Subroutines 


Figure  1:  Functional  decomposition  Figure  2:  Characteristics  and  structure 

of  a  process 

In  a  next  step  each  function  Is  realized  as  a  sequential  program  and  embedded  in  a 
process  module.  \ny  conventional  higher  order  language  suitable  for  the  kind  of  ap¬ 
plication  may  be  used  for  the  Internal  programming  of  a  process  module  as  long  as  the 
special  characteristics,  of  the  process  module  are  taken  Into  consideration.  A  process 
module  Is  sel fcontalned  and  autonomous.  Figure  2  shows  the  process  characteristics 
and  the  module  structure.  The  only  way  how  a  process  module  can  be  accessed  Is  by 
messages  from  other  modules.  Vice  versa  a  process  module  cannot  access  anything  else 


10-3 


but  another  process  module.  Hence  any  data  base  or  a  device  must  be  embedded  In  a 
process  module.  Such  a  process  module  Is  called  a  monitor.  All  functions  of  the  op¬ 
erating  system  are  represented  as  monitors  too.  Monitors  are  not  as  mobile  as  ordinary 
process  modules  are.  Figure  3  represents  an  example  of  a  process-network. 

Communication  between  process  modules  takes  place  by  sending  or  receiving  messages 
(Walden,  D.C.,  1972).  The  sending  and  the  receiving  process  must  agree  to  the  message 
before  the  transfer  Is  actually  executed.  This  message  concept  makes  the  processes 
really  autonomous  and  protects  them  against  erroneous  Information  from  other  processes. 
Communication  by  messages  will  be  treated  In  detail  later  on.  But  It  may  be  mentioned 
that  the  message  type  being  Introduced  In  the  language  Ada  Is  no-  sufficient  and  has 
to  be  backed  up  by  other  types  of  messages. 


Figure  3:  A  typical  process-network 


Nat  Elements: 

Autonomous  Computer  iNode) 

— •  Bidirectional  Channel  (line) 

Some  Net  Configurations: 

o 

00-0 

IX 

Figure  4:  Building  block  system  I 


What  has  been  constructed  up  to  this  point  Is  a  pure  software- network .  There  Is  nothing 
said  so  far  about  the  hardware  system  into  which  the  software-network  Is  to  be  loaded 
and  where  It  Is  to  be  processed.  Indeed  the  software-network  could  be  loaded  Into  any 
computer  system  no  matter  If  it  Is  a  single  processor  architecture,  a  multiprocessor 
architecture  or  even  a  computer-network.  This  Independence  of  the  hardware  architec¬ 
ture  represents  an  Ideal  base  for  reconfiguration.  On  the  other  hand  It  allows  to 
adapt  the  hardware  architecture  to  the  requirements  of  a  given  application.  This 
could  be  particularly  beneficial  If  the  hardware  would  be  composed  from  a  building 
block  system  consisting  of  a  few  and  highly  standardized  components. 

3.  THE  EXPERIMENTAL  FFM-MCS 


A  building  block  system  with  a  high  degree  of  flexibility  on  which  our  research  Is 
based  has  been  described  earlier  (v.  Issendorff,  H.  and  GrUnewald,  W.,  1980).  Figure  4 
and  5  depict  the  main  features.  The  hardware-network  can  be  adapted  to  a  given  pro¬ 
cess-network  In  two  levels.  At  the  higher  level  autonomous  computers  (nodes)  can  he 
Interconnect*  by  channels  (lines)  to  an  arbitrary  network,  l.e.  to  a  network  with  an 
arbitrary  number  of  nodes  which  are  Interconnected  In  an  arbitrary  manner.  The  channels 
are  only  necessary  In  a  logical  sense.  Several  channels  can  always  be  combined  to  one 
bus  If  this  Is  desirable.  No  other  difficulties  would  arise  from  this  but  possibly  a 
bus  contention  problem. 


104 


At  the  lower  level  each  autonomous  computer  can  be  equipped  with  several  processors, 
memory  modules  and  peripheral  controllers  besides  of  the  communication  links  which 
serve  for  the  Interconnection  of  the  channels. 

In  general  a  process-network  will  not  have  an  equal  flow  of  Information  between  pro¬ 
cesses.  There  will  be  groups  of  processes  which  communicate  heavily  while  others  will 
have  a  rare  or  small  data  transfer  only.  For  efficiency  reasons  these  groups  are  pref¬ 
erably  clustered  In  one  node  or  at  least  In  nodes  which  are  directly  connected.  A  node 
which  contains  several  processes  should  of  course  be  equipped  with  several  processors. 


r 

# —  3 -P 

r  3-P 

rP  1 

1 

/-64KByte 

/-64KByte 

/-48  KByte  | 

/-  3-CL 

h  3-CL 

-Cl  1 

h  Link  to  MBS 

/—  Graph.  Display  / 

-Autoload  ROM  1 

I  ft 

/—  Paper  Tape  Reader 

J~ SMB  A 

-Disk  j 

S 

o - ^ 

v! — X® 

£  !! 

\ 

r*  Displays  !| 

i 

\ 

/-lineprinter  l| 

\ 

A-  Cordreader  1 1 

\ 

L  3  Switches  | 

i1 

\ 

i  $ 

b\ 

V 

2-P  / 

V 

-3-P 

i 

l 

l 

j  h 

32K  Byte  j- 

-48  KByte 

i 

L  L 

2-CL  L 

3-a  ST 

tUCIURE  3 - 1 

a 

process  Cl 

:  Comm,  link 

©rOS- Kernel  SMC 

:  System  Message  Bevice 

1  P: 

Processor  MBS 

:  Microcomputer  Bev.  System 

Figure  5:  Hardware  building  block  Figure  6:  Some  configurations  of  the 
system  II  FFM-MCS 

The  experimental  FFM-MCS  is  constructed  In  very  much  the  same  way.  Some  restrictions 
and  some  compromises  had  to  be  accepted  because  of  the  utilization  of  commercial 
products.  Base  of  the  building  block  systems  Is  the  SUE-minlcomputer  from  Lockheed 
Electronics  which  has  a  bus-oriented  architecture  and  can  be  extended  up  to  4  proces¬ 
sors.  Figure  6  exhibltls  two  different  configurations.  Each  node  has  Its  own  operating 
system  kernel.  This  kernel  handles  several  types  of  supervisor  calls,  e.g.  calls  to 
allocate  processors  to  processes  which  are  ready  to  be  started  and  calls  to  control  the 
communication  with  adjacent  nodes.  The  set  of  operating  system  kernels  represent  the 
basic  operating  system  of  the  network.  Figure  7  depicts  the  logical  system  structure 
which  displays  the  different  layers  and  the  kind  of  communication  between  them. 

Bootstrapping  of  the  MCS  Is  done  stepwise.  It  begins  with  a  node  which  Is  reset  by  hand. 
This  node  then  resets,  loads  and  starts  his  neighbour  nodes,  which  then  do  the  same  with 
their  next  neighbours  until  the  whole  system  is  bootstrapped.  It  does  not  matter  where 
the  bootstrap  begins  provided  that  the  first  node  has  access  to  a  data  base  where  the 
basic  operating  system  Is  stored.  But  each  bootstrapped  path  must  be  predefined  In  order 
to  guarantee  that  each  node  will  be  bootstrapped  only  once. 


10-5 


4.  COMMUNICATION 

A  conventional  monolithic  computer  permits  direct  access  to  (globally  defined)  data  from 
any  point  of  a  program.  Hence  data  do  not  have  to  be  moved  very  often.  This  architecture 
offers  high  speed  performance  but  suffers  from  a  rather  Tow  reliability  because  one  sin¬ 
gle  fault  may  destroy  the  whole  system.  A  distributed  system  Instead  permits  high  system 
protection  on  the  expense  of  a  high  load  of  data  transfer.  The  control  of  the  data  trans¬ 
fer  between  autonomous  process  modules  even  Increases  this  load.  Communication  Is  there¬ 
fore  the  most  Important  topic  of  distributed  systems  and  has  carefully  to  be  Investigated 
In  order  to  preserve  system  efficiency.  The  results  which  we  got  In  this  area  are  presen¬ 
ted  1  n  the  sequel . 


Figure  7:  Logical  System  Structure  Figure  8:  Types  of  Messages 


Several  types  of  messages  are  needed  for  practical  reasons  like  efficiency  or  safety. 
They  are  listed  In  figure  8.  Each  message  consists  of  a  wrl te- 1 n structl on  Ip  one  process 
and  a  corresponding  read-instruction  In  another  process.  The  parameters  In  the  brackets 
of  both  instructions  will  be  checked  prior  to  the  Information  transfer.  The  first  type 
Is  called  an  open  Input  message.  This  type  corresponds  to  the  rendezvous  concept  intro¬ 
duced  In  Ada  ( I chbl a  ,0 . D .  ,e t  al.,  1979).  An  open  Input  message  Is  necessary  If  there  Is  a 
receiving  process  which  has  to  accept  messages  from  several  other  processes.  The  receiv¬ 
ing  process  Is  waiting  until  another  process  contacts  him  by  transmitting  his  name  and 
where  he  Is  located.  This  results  In  a  most  extensive  protocol  of  4  steps  (figure  9). 
The  next  message  type  called  open  output  message  Is  complementary  to  the  first  type. 
This  message  takes  care  of  sending  data  to  any  process  which  applies  for  them.  The 
sending  process  does  not  know  the  receiving  process  until  he  gets  a  call  which  tells 
him  name  and  location  of  the  receiver.  The  protocol  consists  of  3  steps  only. 

A  third  message  type  Is  called  private  message.  This  type  has  been  used  in  CSP  (Hoare, 
C.A.R.,  1978).  Here  both  the  sending  and  the  receiving  process  know  each  other  by  n?me 
and  location.  This  message  permits  a  maximum  of  protection  against  unauthorized  access 
by  other  processes.  The  protocol  consists  of  3  steps  as  the  open  output  message. 

The  open  Input  message  Is  mainly  used  to  transfer  Information  to  monitors,  e.g.  to  a 
printer-monitor.  The  open  output  message  on  the  other  hand  serves  for  the  case  where 
several  processes  compete  for  one  message  which  Is  repeatedly  produced  by  the  sending 


A 


10-6 


process.  This  arises  for  example  If  a  process  Is  duplicated  for  speed  or  fault  tolerance 
reasons  as  Indicated  by  dashed  lines  In  figure  3. 

The  open  output  message  Is  Introduced  for  efficiency  reasons  only.  It  could  be  sub¬ 
stituted  by  an  open  Input  message  followed  by  a  private  message  with  an  expense  of 
7  steps  altogether.  In  this  case  the  open  Input  message  serves  merely  to  transmit  name 
and  location  of  the  calling  process.  Even  the  private  message  could  be  substituted  by 
the  public  Input  message  yet  with  the  disadvantages  of  a  longer  protocol  and  a  reduced 
protection. 


Figure  9:  Message  Transfer  Protocols  Figure  10:  A  Simple  Case  of  a 

Deadl ock -11 ng 


Message  type  IV  Is  called  telegram.  It  permits  transmission  of  Information  without 
the  control  of  the  receiving  process.  Only  one  word  at  a  time  can  be  transferred.  This 
message  type  Is  very  useful  If  Information  Is  to  be  transferred  which  is  not  time 
critical.  For  example  such  information  could  be  either  slowly  changing  data  from  an 
Input  device  or  status  reports  or  the  like.  A  telegram  has  a  two  step  protocol. 

There  Is  another  important  subject  with  regard  to  communication  which  has  to  be  discussed 
too.  By  inspecting  an  average  process  module  It  will  be  recognized  that  there  are  a 
group  of  several  message-instructions  at  the  beginning  and  another  group  at  the  end  of 
the  module  with  only  some  of  them  scattered  in  between.  The  instructions  at  the  begin¬ 
ning  will  mostly  be  read-instructions,  collecting  Input  parameters  from  other  processes 
while  those  at  the  end  will  mostly  be  wrlte-lnstructions  which  distribute  the  results  to 
other  processes.  There  seem;  to  be  no  reason  why  the  read-  and  write-instructions  should 
not  be  executed  in  the  same  sequential  order  as  all  other  Instructions  in  a  process 
module.  And  Indeed  no  problem  will  show  up  as  long  as  there  are  only  a  few  processes 
with  little  communication  between  tnem.  But  this  changes  with  an  Increasing  number  of 
processes  especially  If  they  are  closely  interrelated  by  messages. 

A  first  problem  will  be  that  the  software-network  becomes  trapped  in  deadlocks  though  it 
may  be  logically  correct.  The  effect  Is  explained  in  figure  10.  The  messages  In  each  of 
the  processes  A,  B  and  C  are  assumed  to  be  independent  with  regard  to  their  contents. 
The  message  cannot  be  executed  however  because  of  the  order  of  the  read-  and  write-  in¬ 
structions.  Each  process  tries  to  execute  the  Instruction  of  a  different  message  and  Is 
waiting  for  signals  from  another  process,  therefore.  This  results  In  a  deadlock-ring 
which  is  being  displayed  in  a  dependency  structure. 


10-7 


A  deadlock-ring  can  easily  be  broken  up  by  reversing  the  order  of  messages  In  one  of  the 
processes.  Moreover,  they  are  easily  to  be  detected.  Checking  for  deadlock-rings  could 
even  be  done  during  compile-time  If  there  were  a  suitable  higher  order  language  for 
distributed  systems. 

But  there  ^  another  problem  which  arises  with  the  sequential  execution  of  message- 
instructions:  The  average  delay  time  of  a  message  Increases  with  the  number  of  all 
messages  In  the  process-network .  The  overall  delay  time  becomes  proportional  to  the 
number  of  processes  If  the  processes  are  Interconnected  to  a  ring  (figure  11). 


Constculivt 

Simultaneous 

Communication 

Communication 

(worst  cost) 

(best  cose) 

4  processes  > 

A  —  B  •  • 

A-B  C-D 

•  B-C  • 

•  C-D 

A - 0 

A^B-C^-D 

6  processes: 

A— B  •  •  •  • 

A-B  C-D  E-F 

•  B  —  C  •  •  • 

•  C  —  0  •  • 

•  D-E  • 

•  E-F 

A - F 

A  -\  B  —  C  0-E/-F 

A*-B :  Message  transfer  between  A  ond  B  in  1 

either  direction 

•  :  Process  is  wailing 

Processor 


Processor  with  / 
additional  Memory  y 

/! 


/ 


Private 

Memory 


Peripheral  J  V 

Controller  ! 


\  _ 


Communication  V. 
link 


e 

"E 

c- 

i 

D 


Figure  11:  Consecutive  and  Simultaneous  Figure  12:  The  Janus  Processor 
Communication 

Both  problems,  the  deadlock  problem  and  the  delay  time  problem  would  be  solved  If  Instead 
of  executing  the  message-instructions  sequentially  this  could  be  done  simultaneously. 
This  wou' d  reduce  the  total  time  for  communication  in  the  example  of  figure  11  to  2  steps. 
Simultaneous  execution  is  possible  only  for  messages  which  are  not  related  with  regard 
to  their  Information  contents.  This  means  that  the  contents  of  one  message  must  not  be  a 
function  of  another.  This  restriction  could  not  be  easily  Implemented  but  there  Is  an¬ 
other  stronger  restriction  which  Is  clear  and  simple.  It  holds  under  the  additional 
assumption  that  there  will  be  a  separate  buffer  for  each  message-instruction:  Blocks 
of  messages  i.e.  sequences  of  mes sage- 1 nstr uc tl ons  which  do  not  contain  any  data  pro¬ 
cessing  may  be  executed  simultaneously.  Simultaneous  execution  means  that  the  block  of 
message-instruction  will  be  repeatedly  run  through  until  all  messages  are  carried  out. 

5.  IMPROVEMENTS  OF  THE  HARDWARE  BUILDING  BLOCK  SYSTEM 


The  FFM-MCS  Is  operational  for  about  two  years  and  has  been  used  for  several  test 
applications.  Detailed  measurements  have  been  carried  out  to  localize  bottlenecks 
and  to  get  a  better  Insight  Into  the  dynamic  behavior  of  the  system.  (Neumann,  G.  , 
et  al.,  1980).  The  evaluation  of  the  results  led  to  several  Improvements  with  regard  to 
the  hardware  structure.  These  Improvements  are  currently  being  Implemented  in  a  new 
experimental  system  which  will  be  used  for  further  research  on  reliable,  fault  tolerant, 
fall  soft  and  damage  resistant  systems.  The  name  of  the  new  system  Is  MICON  (Micro¬ 
computer-network)  . 


10-8 


Key  component  of  this  building  block  system  will  be  4  mlcroprogremmable  microprocessor 
with  two  Identical  Input/output  ports  as  his  main  feature  (figure  It).  Because  he  can 
look  and  act  to  two  sides  at  the  same  time  he  Is  called  J anus-processor .  The  second 
port  can  be  used  In  three  different  ways.  It  can  be  used  for  the  connection  of  private 
memory  which  doubles  the  adress  space  of  the  processor.  But  even  more  Important  It 
allows  to  store  code  and  data  which  are  heavily  accessed  and  reduces  the  nodal  bus  con¬ 
tention  thereby. 

With  a  peripheral  device  connected  to  the  second  port,  the  Janus-processor  would  serve  as 
an  Intelligent  device  controller  and  could  take  care  of  the  appropriate  device -monl tor 
at  the  same  time. 

The  Janus-processor  finds  the  most  Important  application  as  an  active  communication  link, 
l.e.  with  the  second  port  being  directly  or  Indirectly  connected  to  the  bus  of  another 
node.  The  Janus-processor  Is  able  to  handle  the  message  transfer  between  the  two  nodes 
all  alone,  a  work  which  has  to  be  controlled  In  the  MCS  by  a  complex  and  lengthy  dialogue 
between  the  masterprocessors  of  both  nodes.  While  In  the  MCS  the  transfer  of  a  message 
to  an  adjacent  node  takes  4.5  ms  plus  additional  33  ys/word,  a  transfer  In  MICON  will 
probably  be  reduced  to  something  like  250  ps  plus  10  ps/word.  (Average  Instruction  ex¬ 
ecution  time  In  both  systems  Is  about  3  ps.) 

A  new  design  of  the  node  Internal  bus  will  be  another  major  hardware  Improvement. 
The  nodal  bus  arbitration  will  be  piecewise  attached  to  all  processors  and  memory  mod¬ 
ules  which  are  plugged  to  the  bus  and  therefore  be  totally  decentralized.  The  number  of 
processors  which  can  be  plugged  to  the  bus  Is  merely  restricted  by  the  number  of  open 
slots  and  may  be  as  high  as  16. 

The  changes  of  the  bus  control  permit  to  distribute  the  functions  of  the  master-processor 
of  the  MCS  to  all  processors  of  each  node.  The  contention  problem  of  the  master-processor 
has  such  been  eliminated,  too. 

Acknowl edgement:  This  research  would  not  have  been  possible  without  the  cooperation  of 
many  coworkers.  Besides  of  W.  Griinewald  and  G.  Neumann  who  have  been  mentioned  already 
before,  1  would  very  much  like  to  thank  W.  Jansen  who  Is  taking  care  of  the  hardware,  a 
contribution  which  cannot  be  valued  highly  enough. 


References 


HSndler,  W.,  1973,  "The  concept  of  Mac ro-PI pel  1 nl ng  with  high  availability" 

El.  Rechenanl .  15,  Nr.  6,  pp. 269-274 

Hoare,  C.A.R.,  1978,  "Communicating  Sequential  Processes" 

CACM  21,  Nr.  8,  pp .  666-677 

Ichblah,  J.D.  et  al.,  1979,  "Rationale  for  the  Design  of  the  ADA  Programming 
Language"  S1GPLAN  NOTICES,  Vol .  14,  Nr.  6,  11.4.2 

v.  Issendorff  H.,  Griinewald,  W.,  1980,  "Ar.  adaptable  Network  for  Functional  Distributed 
Systems"  Conf.Proc.  7.  Symp .  on  Comp. Arch.,  IEEE  Cat.  Nr.  80CH  l494-4c  pp.  196-201 

MASCOT,  1980,  "The  Official  Handbook  of  Mascot"  Mascot  Suppl.Ass.,  RSRE,  l)K. 

Neumann,  G.  ,  Ackermann,  R.  u.  Griinewald,  W.,  1981,  "Messungen  zum  Kommuni kati onsaufwand 
fiir  Prozesse  In  elnem  lokalen  Rechnernetz"  Ber.z.  German  Chapter  of  the  ACM,  Bd.7 

Walden,  D.C.,  1972,  "  A  System  for  Interprocess  Communication  In  a  Resource  Sharing  Com¬ 
puter  Network"  CACM  15,  Nr.  4,  pp .  221-230 


DISCUSSIONS 
SESSION  II 


REFERENCE  NO.  OP  PAPER:  1 1 -6 
OISCUSSOR'S  NAME:  Jim  McCuen,  Hughes  Aircraft 
AUTHOR'S  NAME:  K.  Shin 

COMMENT:  What  Is  the  criteria  for  the  need  for  separate  data  and  control  buses?  Can  the  buses 
operating  at  50  megabits  employ  contention-type  protocol? 

AUTHOR'S  REPLY:  1)  To  Increase  data  bus  bandwidth  or  to  reduce  bus  contention.  If  you  don't  separate 
them,  all  control  signals  (Information)  should  be  passed  via  data  bus  to  processors. 

2)  No,  not  as  of  now.  Presently  we  are  considering  the  MIL  STD  1553  serial  bus  that 
has  a  maximum  1-megabl t/sec  bandwidth.  But,  It  may  be  feasible  when  the  fiber  optics  become  available 
for  practical  use.  Note  that  If  you  don't  separate  control  bus  from  data  bus,  the  1553  bus  will  not 
have  the  1-megabit  bandwidth  for  data  passing. 


REFERENCE  NO.  OF  PAPER:  I I -6 
DISCUSSOR'S  NAME:  Or.  van  Keuk,  AVP  member 
AUTHOR'S  NAME:  K.  G.  Shin 

COMMENT:  How  do  you  estimate  the  Importance  of  an  atomic  function?  I  feel  this  can  be  a  difficult  job 
in  a  complex  system  with  a  low  degree  of  redundancy. 

AUTHOR'S  REPLY:  Practically  It  Is  not  too  difficult  although  the  decision  may  be  to  some  extent 
subjective.  For  example.  It  Is  reasonable  to  give  more  Importance  to  an  atom  function  associated  with 
flight  control  than  to  that  associated  with  navigation. 


REFERENCE  NO.  OF  PAPER:  I I -6 
DISCUSSOR'S  NAME:  G.  Scottl ,  Selenla,  Italy 
AUTHOR'S  NAME:  K.  G.  Shin 

COMMENT:  Ycu  have  shown  some  graphs  for  Bus  Request  Profile  and  Average  Bus  Access  Delay.  On  what 
assumption  did  you  define  the  graphs  and  did  you  have  confirmation  of  the  correctness  of  the 
assumptions  via  simulation  or  pr  ctical  measurements? 

AUTHOR'S  REPLY:  Numbers  shown  In  the  graph  don't  mean  anything  significant.  We  assumed  there  are  20 
bus  requests  over  20  time  frames.  This  could  be  very  bursty,  l.e.  all  20  requests  during  the  first 
frame  and  none  thereafter;  or  uniform  distribution  during  the  first  two  time  frames,  l.e.,  10  requests 
for  each  of  the  first  two  time  frames,  and  none  thereafter— or  one  request  for  each  of  20  time  frames, 
etc.  This  is  an  arbitrary  example  which  has  a  reasonable  sense.  Of  course,  bus  access  delay  can  be 
computed  for  any^  bus  request  profile;  therefore  the  graph  In  the  paper  has  to  be  understood  as  a  simple 
but  sensible  hypothetical  example.  Of  course,  this  graph  is  not  obtained  from  real  measurements  and 
does  not  have  to  be  validated  with  such  measurements  since  bus  access  delay  can  be  estimated.  Any  bus 
request  profile  which  will  be  process  (or  task ) -dependent  and  a  random  variable. 


REFERENCE  NO.  OF  PAPER:  1 1 -7 
DISCUSSOR'S  NAME:  H.  Tlmmers,  AVP  member 
AUTHOR'S  NAME:  S.  Wright 

COMMENT:  Can  you  give  some  technical  details  about  the  microprocessor  you  are  using? 

AUTHOR'S  REPLY:  The  main  processor  characteristics  are  outlined  In  the  paper.  Its  important 
characteristics  for  our  application  are  Its  small  size  and  high  speed  (more  than  2000K  OPS  per 
second).  There  have  been  problems  In  achieving  the  correct  Instruction  operation,  but  Intel  has 
promised  a  corrected  version  of  the  device  for  this  August. 


REFERENCE  NO.  OF  PAPER:  II-8 
OISCUSSOR'S  NAME:  Dr.  von  Issendorff 
AUTHOR'S  NAME:  K.  Bramner 

COMMENT:  Would  It  not  be  important  to  Include  the  question  of  vulnerability  Into  the  considerations  of 
which  communication  structure  should  be  preferred? 


S2-; 


AUTHOR'S  REPLY:  Yes,  reduction  of  vulnerability  Is  one  of  the  major  reasons  for  locally  distributed 
processing  and  must  be  considered  with  respect  to  the  Interconnecting  network,  too.  Consider,  for 
Instance,  a  single  cut  of  a  transmission  line— In  the  case  of  the  multiple  access  bus,  you  would  lose 
all  communication;  In  the  case  of  the  layered  star,  It  can  mean  the  loss  of  a  group  of  equipments;  In 
the  case  of  the  star,  one  unit  Is  cut  off;  and  In  the  all-to-all  network,  you  lose  only  one  two-party 
1  Ine. 


On  the  other  hand,  by  comparing  the  cabling  cost  (see  fig.  6)  one  can  easily  afford  at  least  a 
double,  redundant  bus,  and  that  Is  usually  done.  Of  course,  the  two  cables  should  run  along  different 
tracks  separated  as  much  as  possible. 

There  are  several  subsequent  papers  dealing  with  this  problem  In  more  detail. 


REFERENCE  NO.  OF  PAPER:  1 1-8 
OISCUSSOR'S  NAME:  H.  Tlmmers,  AVP  member 
AUTHOR'S  NAME:  K.  Bramner 

C0W4ENT:  Can  you  compare  the  relative  benefits  of  the  ARINC  429  bussing  concept  with  the  MIL  STD 
1553B  bus? 

AUTHOR'S  REPLY:  In  the  context  of  this  paper  the  basic  difference  Is  that  the  ARINC  Standard  allows 
only  one  single  source  to  speak  on  a  given  line,  which  is  unidirectional  (simplex),  while  the  MIL 
Standard  allows  multiple  sources  to  use  one  blodlrectlonal  (partial  duplex)  line  on  a  time  division 
basis.  So  the  ARINC  Standard  requires  a  separate  cable  for  each  unit  that  has  Information  to  transmit, 
whereas  with  the  MIL  Standard  all  transmitting  units  share  the  same  cable. 

Both  Standards  use  a  single  twisted  and  shielded  pair  of  wires  for  the  channel  line  and  In  both 
cases  the  messages  are  broadcast  to  all  listeners  (multiple  sink).  The  bit  rate  of  the  MIL  Standard  Is 
ten  times  higher  than  that  of  the  ARINC  Standard,  but  due  to  time  sharing  and  control  overhead  of  the 
former,  the  average  data  capacity  allocatable  to  the  message  sources  connected  to  a  MIL  bus  may  easily 
drop  below  the  value  possible  with  ARINC.  The  absence  of  bus  access  management  In  the  ARINC  concept 
facilitates  system  specification  and  integration,  but  ARINC  requires  definitely  more  cabling  (in 
general,  n  times  as  much  as  MIL  If  there  are  n  sources).  Also,  while  the  number  of  transmitter  ports 
is  equal  for  both  standards,  the  number  of  receiver  ports  Is  much  higher  for  ARINC:  an  equipment 
listening  to  x  other  equipments  needs  only  a  single  receiver  port  with  MIL,  hut  x  receiver  ports  with 
ARINC. 


REFERENCE  NO.  OF  PAPER:  I I -8 

OISCUSSOR'S  NAME:  Jim  McCuen,  Hughes 
AUTHOR'S  NAME:  K.  Brammer 

C0W4ENT:  I  question  the  future  use  of  ARINC  429  multiplexing  due  to  the  Increased  number  of  buses 
required  on  the  latest  commercial  aircraft,  e.g.,  the  767  aircraft  requires  over  130  buses  and  one 
avionics  black  box  (LRU)  requires  22  ARINC  429  receivers.  The  Airbus  A310  provides  another  example  of 
how  ARINC  429  is  outdated. 

AUTHOR'S  REPLY:  I  understand  that  this  is  not  a  question  but  a  comment.  It  illustrates  some  of  the 
points  in  the  paper,  thank  you. 


REFERENCE  NO.  OF  PAPER:  1 1 -9 
OISCUSSOR'S  NAME:  P.  A.  Bross,  ESG 
AUTHOR'S  NAME:  S.  Maner 

COMMENT:  Why  do  you  synchronize  software  for  multiprocessors  instead  of  using  mailboxes,  where 
processors  can  access  data  asynchronously?  Why  did  you  use  a  serial  bus  for  communication  between  the 
processors  instead  of  a  parallel  bus? 

AUTHOR'S  REPLY:  (1)  The  synchronization  and  quantization  of  the  software  Is  desired  In  this 
architecture  because  it  shows  potential  for  simplifying  development,  validation,  and  verification  of 
software  in  a  distributed  system.  The  "mailbox"  approach  to  data  access  is  not  compatible  with  the 
continuous  reconfiguration  concept.  The  state  information  matrix,  however,  might  be  considered  a 
"mailbox"  where  the  "mail"  Is  delivered  immediately  Instead  of  having  to  go  to  the  post  office  to  get 
i  t- 


(2)  Although  a  parallel  bus  has  the  potential  for  being  much  faster  than  a  serial  bus, 
there  are  several  reasons  why  we  did  not  use  a  parallel  bus.  First,  the  number  of  interconnected  wires 
would  be  very  large  and  would  greatly  inhibit  the  degree  tu  which  the  processing  modules  could  be 
physically  distributed.  Also,  a  failure  in  a  single  wire  of  the  parallel  bus  would  essentially  cause 
the  entire  bus  to  fail. 


S2-3 


REFERENCE  NO.  OF  PAPER:  I I -9 

CISCUSSOR'S  NAME:  Alan  Stem,  Boeing  Military 

AUTHOR'S  NAME:  S.  Maher 

COMMENT:  Questions  concerning  failure  modes:  What  Is  to  prevent  an  “autonomous  controller"  to  seize 
control  of  the  bus  due  to  a  failure  of  one  of  the  microprocessors?  What  prevents  one  microprocessor 
from  writing  bad  data  Into  all  the  SIM,  adversely  affecting  flight  safety? 

AUTHOR'S  REPLY:  (1)  Autonomous  control,  as  Implemented  In  this  architecture,  has  no  Influence  over  the 
actual  transmission  or  reception  of  Information.  The  transmitters  and  receivers  are  Independent  pieces 
of  hardwired  digital  logic,  designed  so  that  a  transmitter  can  control  only  one  bus  at  a  time.  The 
microprocessor  Itself  has  no  control  over  the  transmission  of  data  beyond  supplying  the  data  to  the 
transmitter  Input  buffer. 

(2)  We  have  several  methods  of  “filtering  out"  faults  from  the  system.  These  Include 
self-test,  hardware  voting,  watchdog  timer,  software  voting,  and  "blackballing  hy  peers"  where  a 
concensus  of  the  other  processors  In  the  system  can  eliminate  a  faulty  processor.  Another  method  of 
"fault  filtering”  which  shows  great  promise  for  supplying  broad  coverage  and  reducing  overhead  software 
Is  the  self-checking  microprocessor  pair  (SCMP)  Implemented  by  Honeywell  In  a  parallel  effort  to  the 
In-house  program.  The  SCMP  Is  simply  a  pair  of  tightly  synchronized  microprocessors  configured  as  a 
single  processor.  The  outputs  of  each  processor  are  compared  bit-by-bit  to  detect  faults.  Any  of  the 
“fault-filter”  methods  will  offer  a  certain  degree  of  coverage  for  the  possible  variations  of  a  certain 
type  of  fault,  in  this  case  a  processor  attempting  to  fill  the  state  Information  matrix  (SIM)  with 
erroneous  Information.  Hopefully,  the  total  fault  filter  will  closely  approach  100-percent  coverage 
for  all  possible  faults. 


REFERENCE  NO.  OF  PAPER:  I I -9 

DISCUSLOR'S  NAME:  R.  W.  MacPherson,  O.N.O.,  Canada 
AUTHOR'S  NAME:  S.  Maher 

COMMENT:  Your  system  Is  highly  redundant  except  for  the  clock.  What  happens  If  It  falls?  Do  you  have 
” recon flgurable  clocks"? 

AUTHOR'S  REPLY:  The  clock  Is  redundant  and  is  generated  by  bus  termination  circuits.  For  example,  a 
system  with  four  buses,  as  we  are  Implementing  at  the  Flight  Dynamics  Laboratory,  has  four  bus 
termination  circuits.  Each  bus  termination  circuit  has  four  functions.  First,  simply  to  terminate  a 
data  and  associated  clock  bus  for  noise  suppression.  Second,  to  monitor  both  the  clock  and  data  bus 
and  eliminate  the  bus  In  the  event  a  failure  Is  detected.  Third,  to  generate  a  1-MHz  clock  to 
synchronize  data  transmission  between  processing  modules.  Finally,  the  bus  termination  circuit 
generates  a  synchros zatlon  pulst  every  millisecond  to  synchronize  the  processing  modules  at  the 
"mllllmodules”  boundary.  Each  processing  module  has  a  voting  circuit  requiring  at  least  two 
synchronization  pulses  be  present  simultaneously  before  accepting  the  pulse.  The  bus  termination 
circuits  also  synchronize  the  synchronization  pulses  through  a  similar  voting  circuit. 


REFERENCE  NO..  OF  PAPER:  1 1 -9 
DISCUSSOR  '5  NAME:  B.  Zempollch,  USN 
AUTHOR'S  NAME:  S.  Maher 

COMMENT:  We  have  had  problems  with  regard  to  where  does  fault- tolerant  design  begin  and  end.  For 
example,  do  you  Include  the  power  supplies  In  your  fault-tolerant  conceptual  design?  Do  you  consider 
the  bounding  of  your  fault-tolerant  design  to  no  single-point  failures? 

AUTHOR'S  RE®LY:  The  architecture  described  In  this  paper  was  Intended  to  be  a  research  effort  aimed  at 
Implementing  a  flight  control  computer  using  distributed  processing  techniques.  The  resources  were  not 
available  to  study  any  larger  segaent  of  the  flight  control  system,  nor  would  It  have  been  appropriate 
to  do  so.  Presently,  there  is  much  work  being  done  In  this  area.  Once  we  have  completed  trade-off 
studies  we  should  be  able  to  urderstand  the  advantages  and  disadvantages  to  the  many  possible  methods 
of  Implementing  a  distributed  fault-tolerant  computer  complex,  we  can  then  expand  our  efforts  to 
Include  larger  segnents  of  the  system  such  as  power  supply,  sensors,  actuators,  etc. 


REFERENCE  NO.  OF  PAPER:  11-10 
DISCUSSOR'S  NAME:  J.  T.  Martin,  Ferranti 
AUTHOR'S  NAME:  H.  von  Issendorff 

C0M4ENT:  The  system  was  likened  to  MASCOT.  A  program  running  under  a  Mascot  Kemal  can  be  slowed  by 
real  time  Interrupts.  How  does  the  system  cope  with  real  time  Interrupts  and  what  effect  do  they  have? 


'2-4 


AUTHOR'S  REPLY:  There  are  no  Interrupts  In  our  system.  Each  event  coming  In  from  the  outside  Is 
received  by  a  monitor  which  then  may  Inform  other  processes  by  sending  messages. 


REFERENCE  NO.  OF  PAPER:  11-10 

DISCUSSOR'S  NAME:  K.  Shin,  Rensselaer  Polytechnic  Institute 
AUTHOR'S  NAME:  H.  von  Issendorf 

COMMENT:  Simultaneous  message  communication  Is  obviously  superior  to  sequential  one.  However,  there 
must  be  a  way  of  handling  precedence  constraints  existing  In  process  communication  which  may  force 
messages  to  be  dealt  with  sequentially.  This  Is  needed  even  for  the  case  when  the  message  passing  Is 
the  communication  method. 

AUTHOR’S  REPLY:  As  pointed  out  In  the  presentation  already,  simultaneous  communication  Is  not  allowed 
If  the  content  of  the  messages  depend  on  each  other.  The  precedence  constraint  In  our  system  Is  that 
only  blocks  of  messages  with  no  data  processing  In  between  may  be  treated  simultaneously.  This 
restriction  Is  sufficient  because  each  message  has  Its  own  private  buffer. 


11-1 


SAVAN’  -  A  DATABASE  MANIPULATION  TECHNICJJE  TOR  SYSTEM  ARCHITECTURE 
DESIQN  VERIFICATION  AND  ANALYSIS 

by 

Dr  A.  A.  Callaway 

Flight  Systems  Department 
Royal  Aircraft  E-  4ablishmer.t 
Famborough,  Hampshire 
England 


SUMMARY 

SAVAHT  -  System  Architecture  Verification  and  Analysis  Technique  -  is  a  computer  program  developed 
within  RAE  specifically  to  provide  a  tool  for  automatic  system  design  verification  and  analysis.  Its 
application  is  oriented  towards  loosely-coupled  bus  connected  systems  but  is  not  exclusively  confined  to 
these.  Flexibility  he.o  been  built  in-o  the  program  to  characterise  aspects  of  the  system  architecture. 

The  SAVANT  program  provides  tbs  facilitiss  for  interactively  initiating,  extending,  modifying,  filing  and 
retrieving  the  database,  which  represents  various  facets  of  the  system  under  investigation,  and  for  con¬ 
figuring  a  system  from  the  database  information.  The  system  thus  configured  a  an  be  analysed  in  a  number  of 
ways  and  the  analyses  performed  can  suggest  how  the  basic  information  should  be  mollfled  in  order  to  oorreot 
errors  and  ineonai standee  or  to  4iprove  efficiency,  ard  so  on.  A  consistent  system  can  be  further  modified 
and  tuned,  although  SAVANT  still  checks  the  validity  of  all  operations  performed.  Finally,  the  user  le  able 
to  'firm  up'  the  system  when  It  has  reached  a  satisfactory  state,  producing  design  requirements  and  a  system 
description  in  a  form  whioh  can  be  input  as  a  schedule  to  a  bus  control  prooeeeor. 

1  INTRODUCTION 

The  configuration  of  avionio  systems  has  seen  a  move  in  recent  years  away  from  the  ooncept  of  a  net¬ 
work  oonnected  system  oontrolled  by  a  large  central  processor  towards  a  more  federated  type  of  arohiteoturw , 
with  digital  prooeaaing  embedded  in  various  subsystems  and  with  the  majority  of  ay stew  data  communicated  hy 
means  of  a  multiplex  data  bus.  A  number  of  recent  papers  have  Justified  this  approach  (1,  2,  3),  and  the 
purpose  of  this  paper  is  to  introduce  a  software  technique  to  assist  in  the  design  of  such  systems. 

The  type  of  data  bus  whioh  has  become  accepted  for  avionio  system  use  la  that  known  as  Mil  Std  1551B 
(ref  4)  in  USA  and  Def  Stan  00/l8  (Part  2)  (ref  5)  in  UK.  This  ia  a  1  Mbit/s  ooooerd -response.  serial  hue 
where  the  system  data  traffic  ia  under  the  software  control  of  a  bus  controller.  Now  the  flexibility  which 
la  inherent  in  the  partitioning  of  distributed  processing  means  that  decisions  taken  at  an  early  design 
stage  for  a  system  will  have  an  effect  on  the  volume  and  nature  of  the  intercocmunicated  data.  This, 
together  with  the  fact  that  tha  system  data  traffic  la  under  software  oontrol,  demands  that  an  Integrated 
top-down  approach  la  takan  to  ths  overall  system  design.  Thus,  it  is  important  to  investigate  the  total 
system  architecture  at  an  aarly  stage  in  the  project  so  that  the  individual  subsystem  requirements  can  be 
hierarchically  derived  frem  a  common  base. 

At  tha  outset  of  a  design  study,  then,  it  is  valuable  to  postulate,  in  a  reasonable  amount  of  detail, 
the  operational  functions  to  he  performed  by  tbs  system,  and  the  nature  of  the  subsystems  which  comprise 
these  functions,  together  with  estimates  of  the  data  flowing  between  the  functional  areas.  Oiven  tfcia 
initial  system  breakdown,  it  is  than  postlble  to  subject  It  to  analysis  in  order  to  obtain  an  early  indic¬ 
ation  of  the  oorrectneas  of  tha  approach.  Important  factors  to  oheok  can  inoludai 

Conformance  -  whether  the  postulated  design  functionally  conforms  to  ths  requirement . 

Conaister.cy  -  whether  data  produced  or  required  hy  one  subsystem  is  consistent  with  tbs 

capabilitiss  or  requirements  of  ether  subsystems,  whether  there  are  oonfliots 
in  the  production  of  data,  and  so  on. 

Completeness  -  whether  all  required  subsystems  exist,  whether  all  required  data  ia  generated 
and  all  generated  data  la  used,  etc. 

Feasibility  -  whether  requirements  placed  on  subsystems  are  within  their  capabilities, 
whether  total  system  data  flow  produces  acceptable  data  bus  loading 
estimates,  and  bo  on. 

It  is  olear  that  the  earlier  the  stage  of  development  at  whioh  probloma  can  be  identified,  the  less 
they  cost  to  reiolve.  It  ie  alec  true  that  the  more  complex  the  proposed  system,  the  greater  will  be  the 
potential  benefits  of  early  system  design  analysis.  At  ths  same  time,  ths  very  aomr'xxity  whioh  prompts 
this  approach  may  result  in  a  design  analysis  procedure  which  4s  extremely  V  'lous,  vims-oe.<suwing  and 
error-prone  in  Itself  urlees  automatic  methods  1 eying  computer  auslstanoe  are  adopted. 

The  systematic  analysis  of  a  large  database  is,  of  course ,  s  task  ideally  suited  to  a  digital  oomputer, 
whioh  hau  an  infallible  memory  and  inexhaustible  patlenoe.  Furthermore ,  once  the  decision  is  made  to  invoke 
automatic  assistance,  than  further  benefits  become  apparent.  As  well  as  using  ths  computer  to  traoe  errors 
and  inconsistencies  in  ths  proposed  system,  the  existence  of  the  database  facilitates  tbs  trying  out  of 
different  configurations,  trade-offs,  etc,  and  the  examination  of  the  subsequent  effects  in  an  Iterative 
manner  which  would  normally  be  too  time  consuming  if  done  manually.  It  can  also  provide  an  autometio 
documentation  service  on  ths  current  and  previous  system  oon figurations,  and  tba  forms  of  the  reports  can 
be  many  and  varied  according  to  the  needs  of  ths  consumer. 


11-2 


Onoe  the  system  database  has  beer  processed  automatically,  further  manipulation  can  be  capable  of 
tuning  the  resultant  configuration  into  a  form  acceptable  to  the  system  designer,  and  the  specification  of 
the  system  data  traffic  which  resides  in  the  database  can  be  used  automatically  to  generate  bus  control 
schedules  and  subsystem  interface  requirements. 

This  paper,  then,  describes  such  a  system  analysis  program  which  has  been  developed  at  RAE  Farnborough 
specifically  to  investigate  problems  of  completeness,  consistency  and  feasibility  for  a  postulated  avionic 
system  design.  Its  application  is  oriented  towards  loosely-coupled  bus-connected  systems  but  it  is  not 
exclusively  confined  to  these,  and  flexibility  has  been  built  into  the  program  by  including  resettable 
parameters  which  specify  aspeots  of  the  intercommunication  philosophy. 

The  technique  is  called  SAVANT,  whioh  is  an  acronym  for  System  Architecture  Verification  and  Analysis 
Technique.  It  is  programmed  in  Coral  66  anu  was  developed  on  a  Prime  300  computer  system.  A  deBign  aim 
was  to  make  the  program  ae  transportable  ae  possible,  using  no  machine  dependent  features  in  the  main  body 
of  the  source  by  extensive  use  of  macro  definitions.  SAVANT  ia  now  operational  on  a  PDF  11/34  Bystem  in 
addition  to  the  Prime  300, 

2  PREPARING  TO  USE  SAVANT 

SAVANT  is  an  interactive  program,  whioh  moans  that  the  UBer  oommunioateo  with  the  program  by  means  of 
a  VDU  terminal  and  keyboard,  giving  commands  to  the  program  whioh  define  the  required  operations  and  resp¬ 
onding  to  prompts  and  questions  displayed  in  order  to  amplify  or  qualify  the  commands.  Results  and  reports 
are  received  directly  on  the  terminal  as  well  ae  being  able  to  be  oonmitted  to  file  for  future  referenoe 
and  hard  copy  output. 

The  purpose  of  SAV/JfT  is  to  provide  an  automatio  tool  to  enable  the  data  flows  in  a  speculative  system 
design  to  be  analysed  ar.d  refined  in  an  iterative  manner  so  that  errors  and  inconsistencies  can  be  correoted, 
the  effects  of  various  trade-offs  can  he  examined  and  aspects  of  the  feasibility  of  the  proposed  design  oan 
be  established  at  an  early  stage. 

The  SAVANT  program  operates  on  a  database  held  in  memory.  The  database  represents  various  facets  of 
the  system  under  investigation  and  SAVANT  provides  the  facilities  for  creating,  modifying,  extending, 
analysing,  filing  and  retrieving  the  data.  The  database  is  divided  into  three  segments:  the  reference 
segment,  the  configured  system  segment  and  the  me  a sagos  segment. 

The  ‘raw1  system  data  whioh  the  user  prepares  forms  ths  basis  of  the  rsference  segment  of  the  database. 
In  order  to  generate  this,  the  user  formulates  a  list  of  potertial  subsystems  which  may  b6  includsd  in  ths 
system  under  investigation,  although  the  term  ' subsystem'  la  vary  much  dependent  on  the  interpretation  which 
the  user  wishes  to  place  on  it.  For  example,  it  may  be  a  single  identifiable  subsystem,  or  it  may  be  a 
complete  functional  arsa  which  in  reality  would  consist  of  a  number  of  distinguishable  subsystems.  On  the 
other  hand,  several  different  'subsystems'  might  in  fact  be  different  manifestations  of  the  same  subsystem 
allowing  for  different  modes  of  operation.  Once  the  data  is  on  the  SAVANT  database,  a  'system'  can  be 
configured  from  whichever  of  the  known  'subsystems'  the  user  wishes  to  nominate,  and  not  necessarily  all  of 
them. 


Each  subsystem  Is  given  a  name  whioh  describes  its  function,  suoh  as  'INU'  (inertial  navigation  ur.it), 
•ADC'  (air  data  computer),  'HUH  1'  (one  option  of  the  head-up  display  subsystem)  and  bo  on.  Having  decided 
on  the  subsystems,  the  user  then  prepares  a  list  of  all  the  data  items  which  are  required  to  be  received  by 
each  and  the  data  items  which  each  will  produce  for  xrunsmiosion  t-o  other  subsystems,  and  each  of  these  data 
items  is  given  a  name.  Thus,  the  INU  might  require  to  recoive  '  BARO  ALT'  and  'MACH'  among  its  input  data, 
and  may  produce  'LATITUDE'  and  'USi’GITUDE'  among  ite  output  data. 

Each  data  item  thus  specified  must  be  provided  with  an  estimation  of  its  iteration  (update)  rate  and 
its  precision,  either  required,  if  it  is  input  by  the  subsystem,  or  capable  of  being  produced  if  it  ia  out¬ 
put  by  the  subsystem.  Also,  the  urito  in  which  the  data  is  represented  may  be  specified. 

For  the  representation  of  r^te,  SAVANT  uses  the  'rate  group'  concept  rathor  than  absolute  iteration 
rates.  i)y  this  method,  the  maximum  data  iteration  rate  in  the  system  (which  might,  in  practice,  be  'yO,  6d 
or  100  Ks,  say)  is  represented  by  Huts  1.  Binary  subdivisions  of  this  rate  are  then  exproesod  ae  Rate  N, 
so  that  Hate  2  ie  one  half  of  Rato  1,  Rato  3  is  one  half  of  Hate  2,  and  so  on.  The  decision  about  absolute 
rate  values  does  not  have  to  be  made  at  this  stage,  and  one  of  the  SAVANT  analyses  can  be  to  investigate  the 
effect  on  system  data  traffic  of  varying  the  value  of  Rate  1.  Rate  0  ie  used  to  represent  a  direct  conr.eotion 
where  the  data  is  not  transmitted  on  the  data  bus. 

The  precision  value  is  simply  the  number  of  bits  needed  to  represent  the  data  quantity.  Thus,  accuracy, 
range  and  resolution  are  comprehended  within  this  figure,  but  it  is  felt  that,  together  with  the  units 
identification,  this  is  adequate  to  express  the  precision  attribute  of  the  aats  quantity  at  this  stage  of 
the  description  without  introducing  undue  complexity. 

Tho  unite  identifier,  like  the  subsystem  and  data  name,  ia  an  alphanumeric  character  string.  If  a 
data  quantity  is  dimensionless,  such  as  a  ratio,  or  if  units  ars  not  considered  important  to  ths  analysis, 
thon  tho  string  can  be  null. 

To  summarise,  then,  in  preparation  for  operating  SAVANT,  the  user  has  formed  a  description  of  the 
speculative  system  whioh  comprises  a  number  of  data  flow  specifiers,  each  of  which  consists  ofi 

Subsystem  name 
Data  item  name 

lata  flow  direction  (transmitted  or  received) 

Data  i  ite 
Data  precision 
Data  units 


11-3 


arid  it  is  such  data  flow  specifiers  which  form  the  basis  of  the  reference  segment  of  the  database  The  way 
the  data  is  used  in  analysing  the  system  design  is  described  in  the  next  section. 

3  THE  OPERATION  OP  SAVANT 

Depending  on  the  phase  of  operation  and  the  state  of  the  database,  the  SAVANT  program  operates  in  one 
of  seven  program  states.  Those  program  states  govern  the  taska  rhioh  the  user  is  able  to  perform.  The 
seven  states  in  which  the  user  oan  operato  are  as  follows) 

EMPTY 

oprat 

INCONSISTENT 

CONFIGURED 

FORKED 

LIMITED 

LIMITED  BORMED 

The  full  range  of  commands  to  which  SAVANT  responds  comprises  52  major  commands,  some  of  which  oan  be 
further  qualified  in  operation.  Of  this  range,  only  a  certain  number  are  valid  in  each  state,  and  the  user 
can  at  any  time  display  the  ourrent  program  state  and  the  menu  of  commands  valid  in  that  state  by  typing 
'H'.  If  the  user  types  a  command  which  is  not  valid  in  the  current  state,  or  is  simply  not  understood, 
SAVANT  comments  on  the  fact,  echoing  the  erroneous  connand  and  reminding  the  uBer  of  the  listing  option. 

With  the  exceptions  of  the  oonxsandB  '  H'  and  'STOP',  all  major  commands  which  the  uBer  types  consist 
of  3~letter  abbreviations  of  the  actual  commands.  A  list  of  all  oommands,  showing  the  abbreviations  and 
the  states  in  which  they  are  valid,  is  given  in  Table  1. 

Often  during  the  oourse  of  command  execution,  the  user  ie  requested  to  supply  further  information, 
such  as  subsystem  or  data  names,  file  names,  and  the  like.  In  such  oases,  if  the  user's  response  is  erro¬ 
neous,  such  as  giving  a  name  which  does  not  exist,  or  specifying  a  file  for  reading  which  contains  the  wrong 
type  of  information,  the  user  is  warned  of  the  error  and  SAVANT  returns  to  the  command  level.  The  program 
always  recovers  from  error  in  this  manner  and  never  exits  without  giving  the  user  a  ohance  to  file  the  data 
on  which  he  has  been  working. 

The  operation  of  SAVANT  in  each  state  will  now  be  described. 

3.1  The  EMPTY  state. 

When  the  SAVANT  program  is  started,  the  database  area  ia  initialised  and  the  program  enters  the  EMPTY 
state.  An  operation  which  can  only  be  dono  in  thiB  stute  is  to  reset  any  or  all  of  the  preset  System  para¬ 
meters.  These  are  declared  in  SAVANT  with  the  values  characteristic  of  Mil  Std  1553s,  where  appropriate, 
so  every  time  the  program  is  run  afresh  then  these  values  will  prevail,  but  onoe  they  are  changed  within  a 
run  then  the  new  values  prevail  until  changed  again  or  until  the  program  is  stopped. 

The  parameters  which  can  be  changed  are  as  follows,  with  the  preset  values  given  in  parentheses) 
lowest  rate  group  (81,  data  word  length  (16  bite),  number  of  words  in  a  message  (32)»  word  overhead  -  eg, 
sync,  parity,  etc  -  (4  bite),  transmission  bit  rate  (1  Mbit/s),  number  of  addressable  terminals  (30), 
typical  message  overhead  for  transfers  involving  the  bus  controller  (2.6  words)  and  not  involving  the  bus 
controller  (4-9  words). 

The  other  operations  which  can  be  performed  in  the  EMPTY  state  involve  the  input  of  data,  either 
directly  from  the  terminal  or  from  a  iiso  file.  Data  input  from  the  terminal  comprises  Information  on  the 
data  now  specifiers  detailed  in  Section  2,  and  this  is  entered  ir  the  reference  segment  of  the  database. 
Multiple  entries  can  be  made  with  one  command,  and  these  can  either  be  mi see 1 laneous  or  specific  to  one 
subsystem.  In  the  latter  case  the  subsystem  name  need  only  be  typed  onoe.  The  ueer  le  prompted  for  each 
apecific  piece  of  information  required.  As  eoor.  ae  an  operation  is  performed  which  places  data  in  the 
referonco  segment  then  the  program  state  becomes  OPEN. 

It  will  be  seen  later  that  a  reference  database  which  existB  within  SAVANT  can  be  saved  as  a  diso 
file,  and  such  a  file  can  also  be  input  in  the  BMPTT  state  to  set  up  the  reference  segment.  Again,  this 
r-isulto  in  the  program  Htate  becoming  OPEN. 

Another  option  in  the  cMPTY  state  is  to  input  information  from  a  disc  Tile  directly  into  the  messages 
segment  of  the  database,  in  which  caso  the  program  state  becomes  LIMITED.  This  is  discussed  in  3.6* 

3.2  The  OPdl  state. 

The  OPEN  state  allows  various  operations  to  be  performed  on  the  reference  segment.  These  fall  into 
several  categorise:  listing,  extension,  modification!  filing  and  state— changing. 

The  listing  ooiwnandn  ullow  various  lists  to  be  produoed.  For  9xample,  one  oan  list  the  reference 
database  entries  in  tabulated  form,  or  om  can  list  only  those  entries  which  relate  to  a  specific  subsystem. 
One  can  produce  a  list  of  all  subsystem  names  or  a  list  of  a. 11  data  item  names.  Also,  ons  oan  traoe  a  data 
item  by  lieting  all  occurrences  of  that  name  in  the  reference  segment,  with  the  appropriate  qualifying 
i..  formation. 

With  all  of  the  listing  commands  in  SAVANT ,  if  the  list  produced  oould  exceod  the  capacity  of  the 
terminal  screen  then  the  user  is  offered  +  option  to  pause  on  each  page  00  that  the  information  oan  be 
examined  at  leieure.  This  option  also  allows  the  output  then  to  bo  aborted  rather  than  continued  to  tha 
next  page.  If  the  option  Is  not  exercised  then  the  lieting  rune  to  completion  without  pausing,  which  may 
be  useful  If  a  monitor  file  is  being  produced.  All  the  Hating  noanande  of  the  OPQJ  state  are  also  avail¬ 
able  In  the  INCONSISTENT ,  CONFIGURED  and  FORMED  states. 


11-4 


The  reference  segment  can  be  oxtonded  by  adding  entries  from  terminal  or  file,  using  the  same  oommar.de 
as  are  available  in  the  EMPTY  state. 

Modification  of  the  reference  entries  oan  take  several  forms.  Any  subsystem  or  data  item  name  can  be 
changed,  either  to  a  completely  new  name  or  to  another  name  which  already  exists  on  the  database.  This 
latter  is  useful  for  resolving  spelling  inconsistencies.  Rate  and  precision  entry  valuer  oan  be  modified  in 
the  following  waysi  the  value  for  a  specific  entry  can  be  changed,  the  entry  being  identified  by  the  sub¬ 
system  and  data  nines  and  the  tranemit/reoeive  flag;  the  value  for  a  particular  data  item  oan  be  generally 
set/changed  at  every  occurrence  of  the  data  item,  and  any  specified  rate  -r  precision  value  oan  be  changed 
to  another  value  either  generally  throughout  the  reference  eegment,  or  juet  for  thoe6  entries  relating  to 
a  specified  subsystem. 

A  unite  identifier  can  also  be  changed  to  a  new  or  other  existing  name  either  generally  throughout 
the  reference  eegment,  or  for  every  oocurrenoe  of  a  data  item,  or  just  for  one  specific  entry. 

Finally,  among  the  modification  oommands,  reference  segment  entries  can  be  deleted  in  two  waye. 

Either  a  specific  entry,  identified  by  its  subsystem  and  data  names  and  tranemit/reoeive  flag,  oan  be 
deleted,  or  all  entries  relating  to  a  specified  data  item  can  be  deleted.  If  any  deleting  operation  removes 
the  last  remaining  reference  segment  entry  then  the  program  state  reverts  to  EMPTY. 

The  filing  oommands  enable  the  current  reference  segment  to  be  eaved  in  one  of  two  ways.  Either  the 
complete  reference  segment  can  be  filed  or  only  those  entries  relating  to  a  specified  subsystem.  Using  the 
latter  command,  a  library  of  files  relating  to  different  subsystems  oan  bo  established.  When  the  files  are 
written  they  are  provided  with  identification  which  can  be  checked  as  part  of  the  reading-hack  operation. 

There  are  two  state-changing  commands  available  in  the  OPEN  state.  Firstly,  the  database  can  be 
clearad,  in  which  case  the  program  state  reverts  to  EMPTY.  The  user  is  asked  to  confirm  the  intention  to 
clear  si.-'f’-  any  work  thus  far  performed  will  be  lost  if  no  filing  has  taken  place.  The  second  state- 
changing  command  is  that  which  requests  a  system  to  be  configured  from  the  subsystems  known  to  the  referenoa 
segment.  Here  the  user  oan  specify  that  all  subsystems  are  to  be  included,  or  a  ’yee/no’  Indication  oan  be 
given  as  SAVANT  offers  each  subsystem  in  turn. 

Once  SAVANT  has  ascertained  the  subsystems  to  be  included  in  the  configured  system  it  formulates  the 
configured  Bystem  eegment  of  the  database,  using  information  derived  from  the  reference  segment.  This 
configured  system  se.-pnent  contains  information  on  the  desired  linking  of  data  in  the  system,  identifying  all 
transmitted  data  items,  together  with  their  rate,  precision  and  units  a  ip  >'.-i  \i*  ins,  ar.d  all  configured 
receivers  of  each  data  item,  together  with  their  rate,  precision  and  units  requirements. 

The  data  thus  assembled  is  then  checked  for  fatal  inconsistencies  -  ie,  tho3e  which  preclude  the 
formation  of  valid  messages  between  subsystems  in  the  configured  system.  If  such  exist  then  the  mese.agee 
segment  of  the  database  cannot  be  formula  :od  and  the  program  state  becomes  IN  COH  SI  STOTT .  If  no  fatal 
inconsistencies  exist  then  tho  messages  segment,  which  comprises  the  valid  traffic  resulting  from  the  data 
linking  operation,  is  formulated  and  the  program  state  becomes  CONFIGURED. 

In  neither  of  these  states  can  the  reference  segment  of  the  database  be  modified,  since  this  must 
always  correlate  with  tho  system  which  has  been  configured. 

3.3  The  INCONSISTENT  state. 

If  the  program  state  on  com iguration  is  found  to  be  INCONSISTENT,  then  SAVANT  automatically  gene¬ 
rates  a  list  of  the  fatal  inconsistencies  which  have  bean  found.  These  fall  into  four  categories.  Firstly, 
the  rate,  precision  or  unite  requirement  spocifiod  for  a  data  item  in  a  particular  receiver  may  not  be 
consistent  with  the  capability  specified  for  the  transmitter  of  tho  data  item  (ie,  rate  or  precision  too 
high,  or  different  unite).  Seoonily,  it  may  ho  found  that  a  data  item  is  transmitted  by  more  than  one  sub¬ 
system,  or,  tnirdly,  that  a  subsystem  both  transmits  and  receives  the  same  data  item.  The  fourth  type  of 
fatal  inconsistency  ie  where  the  number  of  addressable  terminals  resulting  from  the  data  traffic  would  exoeed 
the  terminal  limit  set  _ 

Any  of  these  would  preclude  the  formation  of  valid  massage  traffic  in  the  system,  and  the  liat  is 
useful  in  identifying  whore  the  reference  segment  needs  to  be  corrected.  Of  course,  such  modification  car. 
only  be  performed  in  the  0PE2J  state,  so  tho  only  state-changing  connand  available  in  this  state  ie  to 
dismantle  the  inconaistont  configured  system,  in  which  case  the  otate  reverts  to  OPEN.  No  modification  to 
any'  of  the  database  structures  can  be  performed  in  the  INCONSISTENT  state.  The  listing  and  filing  option" 
of  tho  OPEN  state  are  Btild  available,  and  there  are  now  three  further  listing  oornnanda. 

One  of  these  comnandB  allows  tho  user  to  re-display  the  list  of  fatal  inconsistencies,  and  another  is 
used  to  generate  a  list  of  non-fatal  inconsistencies,  indicating  a  lack  of  completeness  of  data  paths. 

He -e  all  data  generated  and  not  used  in  the  configured  syBtem  is  listed,  an  is  all  data  required  hut  not 
generated.  This  type  of  incompleteness  ie  not  fatal  because  it  ioee  not  pro-ride  any  dilemma  in  forming  the 
messages  -  the  incomplete  pa's  1b  simply  not  included  in  the  message  structure  -  uni  it  may  be  an  intent¬ 
ional  condition  of  this  configuration.  Tho  third  new  liBt  oommand  produces  a  general  trace  of  data  in  the 
configured  system,  listing  each  transmitted  data  item,  together  with  its  transmitting  subsystem  and  capabi¬ 
lities  and  all  configured  receivers  with  their  requirement;.  The  latter  two  listing  commands  are  also  valid 
in  the  CONFIGURED  state  -  the  former  is  not  applicable  since  there  are  no  fatal  inconsistencies  if  the 
program  is  in  that  state. 

It  should  be  noted  that  the  'TTO-  command  is  valid  in  all  program  states.  This  stops  the  SAVANT  run 
and  returns  control  to  the  computer  operating  system.  Before  the  command  is  executed  the  user  ie  requested 
to  confirm  the  intention  to  stop  since  all  information  on  the  database  wil  in  lost  unices  filing  has  taken 
p lace . 


11-5 


3.4  The  CONFIGURED  state. 

When  a  system  1b  configured  from  the  reference  segment  information  which  dose  not  involve  any  fatal 
inconsistency  then  the  program  Btate  become  a  OONFIQURED.  Ae  wall  as  formulating  the  configured  system  seg¬ 
ment,  SAV/NT  is  also  able  to  set  up  the  messages  segment  of  the  database. 

A  message  is  a  package  of  data  words  passing  from  a  specific  source  subsystem  to  a  specific  sink  sub¬ 
system  (a  souroe/sink  pair)  at  a  speoifio  rate.  During  configuration  by  SAVANT,  if  the  total  number  of  data 
words  satisfying  this  requirement  in  a  particular  case  nxceede  the  value  of  the  message  length  setting  then 
more  than  one  actual  message  must  be  generated.  Furthermore,  if  the  precision  requirement  for  a  data  item 
is  greater  than  the  word  length  setting  than  mors  then  one  data  word  must  he  used  to  represent  the  quantity. 
This  is  known  as  partitioning  of  data  words.  In  SAVANT,  the  precision  value  of  a  data  item  can  be  as  high 
as  1024  bits  (64  x  16-‘oit  words),  so  it  is  possible,  when  formulating  the  design,  to  comprise  a  large  package 
of  data  under  one  generic  name. 

The  partitioning  of  data  words  and  the  disposition  of  wordB  into  messages  is  performed  automatically 
during  configuration,  and  the  result  forms  the  basis  of  the  meBoagee  segment.  There  is  one  entry  for  eaoh 
message  generated,  and  each  entry  contains  references  to  the  eouroe  and  sink  subsystems,  rate,  number  of 
words  and  the  individual  data  word  names.  Partitioned  data  words  are  assigned  subscripts  so  that  they  oan 
be  individually  identified  during  modification.  Such  modification  can  be  performed,  and  further  information 
added  to  the  messages  segment,  by  commands  available  in  the  CONFIGURED  state. 

The  operations  which  can  be  performed  in  thin  state  fall  into  several  categorises  listing,  analysis, 
modification,  filing  and  state-changing.  A.'  well  as  the  listing  ooimands  available  in  the  OPEN  and  INCON¬ 
SISTENT  states,  a  new  range  of  listing  options  beocme  available  in  the  CON  FI  CURED  state.  For  example,  it 
is  possible  to  list  the  names  of  configured  subsystems,  and  this  list  inoludea  indioatlone  as  to  whether 
each  subsyetem  ie  connected  via  the  data  bus  and,  if  so,  whether  its  terminal  address  has  been  set  and  what 
the  value  is.  The  terminal  addrees  is  required  by  Mil  Std  1553B  protocols,  and  one  of  the  conrnande  in  thiB 
state  provides  for  tho  assignment  of  these  values  by  the  user . 

Several  listing  options  are  derived  from  the  configured  system  segment  of  the  database.  For  example, 
ons  can  produce  a  list  of  all  souroe/sink  pairs  together  with  all  data  items  passing  between  each,  each 
data  item  being  qualified  by  rate  value,  or  cne  can  liBt  this  information  for  one  specified  eourco/Bink 
pair.  One  oan  also  generate  a  visual  map  of  the  data  traffic  between  eource/eink  pairs,  either  displayed 
on  the  terminal  screen  or  directly  into  a  file. 

Other  listing  commands  derive  their  information  from  the  messages  segment.  A  summary  Hot  of  messages 
can  be  generated  whioh  tabulates,  for  each  message,  the  source,  eink,  rate,  number  of  words  and  retry  code. 
The  retry  code  is  intended  to  he  used  for  error  recovery  action  by  the  bus  controller  in  the  actual  syetem 
ae  implemented  and,  since  SAVANT  may  he  used  to  generate  bus  oontrol  BOheduleB,  the  UBor  is  able  to  assign 
or  change  values  of  this  code  for  any  message  whilst  in  the  CONFIGURED  stato.  In  addition  tj  the  summary, 
fuller  details  of  all  bus  messages  and  all  direct  data  packages  oan  be  liBted,  including  identification  of 
the  actual  data  wordB,  or,  alternatively,  one  oan  list  this  information  for  any  one  speoifio  message  which 
needs  to  bo  examined. 

The  analysis  commands  enable  the  user  to  analyse  the  message  structure  for  the  presence  of  message 
subsets,  and  to  calculate  the  data  bus  loading  percentages  which  would  result  from  implementation  of  the 
system  in  practice. 

Tho  subset  information  is  required  so  that  the  user  may  rationalise  the  message  structure  in  prepara¬ 
tion  for  the  automatic  generation  and  assignment  of  subaddreBsee  which  takes  place  when  the  program  le 
advanced  to  the  FORMED  state.  A  subaddrese  is  used  to  identify  a  particular  message,  or  package  cf  data 
within  a  subsystem.  In  other  words,  the  subaddrese  value  can  he  uBod  as  a  vector  which  accesses  the  begin¬ 
ning  of  the  package  of  data  within  the  subsystem.  This  is  particularly  valid  if  n  subsystem  involves 
processing  and  ie  likely  to  buffer  i.s  data  in  memory.  In  the  Mil  Std  1553®  protoool  the  eubaddrsBS  value 
ie  part  of  the  command  word  whioh  is  rent  by  the  bus  controller  to  a  subsystem,  and  it  is  likely  that  the 
eubaddre8B  for  a  particular  message  will  bs  different  in  the  source  and  sink  subsystems. 

Naturally,  if  a  subsystem  transmits  an  identical  message  -  in  terms  of  data  content -to  a  number  of 
different  recoivers,  then  the  contents  of  that  message  oan  be  assigned  a  single  subaddrese  for  the  source 
subsystem.  Furthermore,  if  the  message  sent  to  Terminal  1,  say,  is  a  subset  of  ths  message  sent  to  Term¬ 
inal  2  then  the  subaridress  for  the  shorter  message  within  the  source  subsystem  cam  still  be  tho  name  ae 
that  for  the  longer  message  since  the  word  count  field  of  the  command  word  specifies  the  number  of  words  in 
the  message.  For  this  to  be  valid,  hearing  in  mind  that  the  subaddress  points  to  the  beginning  of  the  data, 
the  words  comprising  the  the  short  message  must  he  contiguous  within  the  longer  message  and  muat  occupy  the 
beginning  of  that  message. 

The  facility  to  list  subsets  ie,  therefore,  provided  to  allow  ice  user  to  examine  the  relevant  messages 
and  make  appropriate  use  of  the  modification  oommande  in  o^ler  that  the  automatically  assigned  auhaddressee 
ara  as  efficiently  derived  ae  poscible. 

An  important  meaBure  of  the  feasibility  of  a  proposed  syrtem  4  a  whether  the  data  bu3  has  the  capacity 
to  handle  all  the  required  data  traffic.  Tho  loading  calculation  command  cav.uec  SAVANT  to  estimate  the 
loading  peroe  .cages  which  would  result  from  running  the  system  in  reality,  using  the  set  values  for  ovsrhsads 
and  transmission  rats.  The  user  is  requested  to  speoify  the  value  in  Ha  represented  by  Rate  1  and  to  declare 
which  B’.haystem  ie  to  act  as  bus  controller.  If  none  of  the  known  configured  oubayeteme  is  indicated  then 
SAVANT  assumes  a  dedicated  controller  which  is  not  taking  part  in  the  aotual  assuage  traffic.  SAVANT 
calculates  and  displays  the  l'ad  in  wordn,  including  ovsrhp'  ' a,  st  each  rate,  ard  then  displays  three  bus 
•.(jading  percentages. 

In  order  to  comprehend  theso  it  ie  necessary  to  defin.  u»  'major  frame'  as  the  interval  represented 
by  the  lowest  iteration  rate  in  the  system  -  ie,  the  iteration  period  of  the  complete  message  repertory, 


and  the  'minor  cycle'  as  the  highest  iteration  rate  period.  The  first  loading  figure  calculated,  then,  is 
the  long-term  (major  frame)  average  lend,  and  then  two  peak  (minor  oyole)  loading  figures  arc  dinplayr.d. 

The  first  of  these  is  calculated  on  thn  basis  that  all  rate  group  moasagoe  uro  initiated  in  tho  same  mino  ■ 
oyole  •  eg,  when  the  system  is  reset  on  ntart-up,  soy  -  nr.d  this  is  tho  peak,  lumped  loading.  The  prik 
distributed  loading,  on  the  othor  hat.d,  assumes  that  the  Initiation  of  the  different  rate  group  moBtiages  ie 
staggered  bo  that  only  the  Rate  1  messages  and  tkoBe  in  one  of  tho  other  rate  groups  ooour  in  any  minor 
oyole.  Thue,  it  nay  be  possible  to  observe,  far  example,  that  a  eyetem  whose  average  loading  is  acceptably 
low  would  produce  an  impossibly  high  peak  lumped  loading  percentage  whioh  iB  alleviated  by  distributing  the 
initiation  of  the  different  rate  group  messages. 

The  use  of  the  loading  analysis  on  the  oonfigurad  system,  particularly  as  the  message  structure  ie 
modified,  or  as  the  referenoe  segment  ie  rationalised  and  the  system  re-con  figured,  provides  a  vital  oheok 
on  the  feasibility  o;'  the  design  approach. 

Two  of  the  modification  commands  have  already  been  mentioned  -  the  ability  to  net  and  change  tei-minal 
addresses  and  retry  oodes.  Two  further  modification  conmande  are  available  in  order  to  allow  the  user  to 
ohange  the  ordor  of  data  words  within  messages  or  to  redistribute  data  words  between  neesagoe.  SAVANT 
formulates  message  contents  in  the  order  in  which  data  Items  are  encountered  in  the  reference  segment,  the 
result  of  whioh  may  not  be  satisfactory  to  the  user.  For  example,  it  may  be  neoessary  to  re-order  a  message 
in  order  to  rationalise  subsets,  or  partitioned  data  may  cross  tho  boundary  between  two  messages  whereas 
the  designer  would  prefer  it  in  one  message,  and  bo  on. 

There  are  three  filing  commands  available  in  the  CONFIGURER  state.  The  two  reference  segment  filing 
options  of  the  OPEN  state  are  still  valid,  and  it  is  now  also  poesihle  to  file  a  message  etruoture  for 
future  referenoe.  In  this  case,  only  the  information  in  the  messages  eegment  of  the  database  is  sent  to 
file,  but  this  includes  any  terminal  addreoB  and  retry  coda  assignments  whioh  might  have  been  made  and,  of 
oourse,  takes  aocount  of  any  message  re-for.jatting. 

Of  the  two  state-changing  commands,  one  dismantles  the  configured  system,  taking  the  program  beok  to 
the  OPEN  state,  thus  facilitating  further  reference  segment  modification.  The  other  advances  the  program 
to  the  FORMER  state. 

3.5  The  FORMED  state 

In  the  FORMED  state  tho  database  represents  a  finalised  system  configuration  and  ie,  therefore,  no 
longer  available  for  modification.  It  is  from  this  state  that  the  user  is  able  to  generate  uehodule  tablee 
for  control  of  the  system  in  reality,  together  with  detailed  information  on  duta  requirsmonts  in  a  subsystem 
related  form. 

When  the  command  is  given  to  form  tho  system,  SAVANT  automatically  generates  the  sourco  and  sir., 
suhaddresses  for  each  meeeage,  making  the  best  possible  use  of  subset  information,  and  implants  these  values 
in  the  messages  eegment  entries.  There  is  a  limit  to  the  maximum  number  of  subaddressos  for  a  subsystem! 
in  Mil  Std  I553B,  for  example,  this  limit  is  31.  Thus  if,  during  the  course  of  formation,  a  message  sub- 

address  value  for  any  subsystem  would  exceed  this  limit  then  the  user  is  warned,  the  forming  fails  and  the 

program  remains  ir.  tho  CONFIGURED  state. 

This,  then,  is  another  check  on  the  feasibility  of  the  proposed  system,  and  if  forming  fails  then  it 

is  up  to  the  user  to  examine  the  message  structure  and  the  reference  segment  data  in  order  to  reduce  the 

total  number  of  messages  a  subsystem  has  to  handle.  One  way,  for  example,  may  be  to  reduce  the  number  of 
different  rate  group  transfers  so  that  messages  may  be  coalesced. 

The  commands  available  In  the  FORMED  state  include  all  the  listing  snd  filing  commandu  of  the  OPEN 
and  CONFIQURED  states  hut  thero  are  no  database  modification  commands  of  any  sort.  There  is  one  new  listing 
command  whioh  displays  a  list  of  all  euhaddress  assignments,  and  one  new  filing  command  which  generates  the 
bus  control  schedule. 

On  receipt  of  this  latter  command,  SAVANT  requests  the  ueer  to  deolare  which  subsystem  is  to  aot  as 
the  bus  controller  am.,  as  in  the  calculation  of  bus  loading,  assumes  a  dedioated  controller  if  the  name 
supplied  is  not  reoognised,  From  the  information  in  the  messages  segment,  a  table  of  bus  control  schedules 
is  created  and  sent  to  a  disc  file.  The  format  for  the  schedule  ie  flexible  and  oan  be  changed  to  suit  any 
particular  bus  controller  implementation  although,  at  the  present  time,  the  flexibility  is  not  paramoterised. 
It  is  intended  that  this  should  bo  so  in  the  futv'e. 

Following  the  schedule  filing,  SAVANT  then  Benda  to  another  disc  file  details  on  eaoh  subsystem's 
requirements,  including  its  terminal  address,  the  data  content  of  all  subaddresses,  transmitted  and  rec¬ 
eived,  and  all  direct  ransfer  information. 

The  only  state-changing  command  in  tho  FORMED  sxate  reverts  the  program  to  the  CONFIGURED  state. 

3.6  The  LIMITED  and  LIMITED  FORMED  states. 

It  was  disouseed  in  3,4  how  the  message  structure  of  a  configured  system  oan  be  saved  as  s  disc  file. 
With  SAVANT  in  the  EMPTY  state,  such  a  file  can  be  mad  back  into  the  messages  segment,  in  which  caso  the 
program  state  becomes  LIMITED,  which  is  a  speoisl  case  of  the  COMF1  1ED  state. 

This  is  because  although  the  messages  segment  exists  it  is  backed  up  by  neither  the  referenoe  segment 
nor  the  configured  system  segjnent,  so  the  range  of  coucands  available  is  constrained.  For  example,  tha 
only  listing  commands  are  those  related  to  tte  messages  segment!  the  listing  of  configured  subsystems, 
meeeage  summer,  ,  meeeage  details  and  the  data  traffic  map,  plus  the  listing  of  data  item  names.  It  is  not 
possible  to  revert  to  the  OPEN  state  beoauss  there  is  no  reference  segment.  The  only  reversion  from  the 
LIMITED  state  is  direct ly  to  the  EMPTY  state. 


11-7 


In  the  LIMITED  state,  however,  the  analysis  and  modification  command*  of  the  CONFIGURED  state  are 
still  available,  so  it  is  posBihla  to  modify  the  message  structure,  to  perform  loading  analysed,  to  assign 
terminal  addresses,  eto.  It  is  also  possible  to  form  the  system,  in  which  case  the  program  utate  heoomes 
LIMITED  FLRMEI).  Thus  hue  oontrol  sohedules  for  the  revised  message  structure  oan  b»  generated.  Reversion 
from  the  LIMITED  H3RMED  state  ie  to  LIMITED. 

Figure  1  shows  the  relationship  of  all  the  SAVANT  program  states. 

4  CONCLUSIONS 

This  paper  has  briefly  described  the  software  technique  for  system  analysis  known  as  SAVANT.  A  full 
report  on  its  development  and  application,  including  an  example  of  its  use,  has  been  published  (ref  6). 

Future  avionio  systems  are  certain  to  become  more  integrated,  where  the  individual  subayetem  elements 
must  be  regarded  ad  performing  oo-ooertvtively  in  order  to  provide  their  overall  contribution  to  ths  system 
task.  The  options  open  to  the  initial  system  designsr  are  varied,  and  decisions  made  at  an  early  stage 
have  a  very  significant  offset  on  the  future  development  of  the  syetoa. 

The  volume  of  information  whioh  has  to  be  considered  at  this  early  design  stags,  and  ths  nature  of 
the  analysis  tasks  which  have  to  be  done,  demand  that  tools  and  techniques  are  developed  whioh  enable  auto¬ 
matic  processing  to  plsy  the  part  for  which  it  is  so  olesriy  suited.  Ths  availability  of  suoh  tools  and 
the  contribution  they  oan  provide  in  easing  the  more  menial  design  tasks  in  an  unerring  manner  mean  that 
the  initial  design  phase  oan  be  more  speculative,  trying  out  different  ideas  and  gauging  their  sffeots. 

This  oar.  only  be  of  benefit  to  the  resulting  design. 

The  SAVANT  teahnique  described  in  this  paper  provides  a  facility  whioh,  It  is  heliovsd,  will  provs 
a  useful  part  of  the  standard  warehouse  of  support  tools  needsd  for  the  development  of  future  svionio 
systems. 

Copyright  (o),  Controller  HMSO,  London  1981 


RKFKR  SUCKS 


E  C  Oangl 

Time-division  multiplexed  data  bus  integration  techniques. 
Proceedings,  USAF  Multiplex  Data  Bub  Conference. 

Dayton,  Ohio,  November  197b. 

A  A  Callaway 

Trends  in  digital  data  processing  and  system  architecture. 

AUARD  OCT  Conference  Proceedings  272. 

Ottawa,  May  1979* 

A  A  Callaway 

The  influence  of  digital  techniques  on  system  architecture. 

Joint  lEE/RAeS  Symposium!  Digital  Avionics,  promise  and  praotioe. 
London,  March  1980. 

US  DOD 

Aircraft  Internal  Time  Division  Command/Response  Multiplex  Data  Bus 
MU  Std  I55.IB 

21  September  1978. 

Ministry  of 
Defence 

Serial  tins  division  comnand/respense  multiplex  data  bus, 

Def  Stan  OO/18  (P»rt  2)  Issue  1. 

26  April  I960. 

A  A  Callaway 

SAVW  -  A  dstabase  manipulation  teohrique  for  system  architecture 
verification  and  design  analysis. 

RAE  TR60101 

January  198 1. 

Table  1 


SAVANT  OOI'HABDS 


Command 


Valid  atataa 


H  -  LIST  VALID  COMMANDS . 2  .  0  .  I  .  C  .  F  . 

RSP  -  RESIST  PARAMETERS . E 

1TE  -  INPUT  TERMINAL  ENTRIES  .  .  .  .  E  .  0 

IfE  -  INPUT  FIXED  ENTRIES  .....  E  .  0 

ITU  -  INPUT  FILED  MESSAGES . E 

LKN  -  UST  ENTRIES . O.I.C.F 

LSD  -  LIST  SUBSYSTEM  DATA . O.I.C.F 

LSK  -  LIST  SUBSYSTEM  NAMES . O.I.C.F. 


LM  -  LIST  DATA  NAMES . O.I.C.F. 

TDP  -  TRACE  DATA  PATH . O.I.C.F 

CSN  -  CHANQE  SUBSYSTEM  NAME . 0 

CDN  -  CHANQE  DATA  NAME . 0 

CRE  -  CHANOE  RATE  ENTRY . 0 

CRQ  -  CHANQE  RATE  GENERAL . 0 

CDR  -  CHANOE  DATA  RATE . 0 

CSR  -  CHANQE  SUBSYSTEM  RATE . 0 

CPE  -  CHANQE  PREC  ENTRY . 0 

CPO  -  CHANQE  PREC  GENERAL . 0 

CDP  -  CHANOE  DATA  PREC . 0 

CSP  -  CHANOE  SUBSYSTEM  PREC . 0 

CUE  -  CHANOE  UNITS  ENTRY . 0 

CUQ  -  CHANGE  UNITS  OENERAL . 0 

CDU  -  CHANOE  DATA  UNITS . 0 

DDR  -  DEIBTE  DATA  REFERENCE . 0 

DOE  -  DELETE  ONE  ENTRY . 0 

FEN  -  FI  US  ENTPIES . O.I.C.F 

PSD  -  FI  IE  SUBSYSTEM  DATA . O.I.C.F 

CLR  -  CIEAR  DATABASE . 0 . 

CFS  -  CON  FIGURE  SYSTEM . 0 

LHP  -  LIST  DATA  PATHS . I.C.F 

DCS  -  LIST  CONFIGURED  SUBSYSTEMS . I.C.F. 

LDT  -  UST  DATA  TRAFFIC . C  .  t' 

MOT  -  MAP  DATA  TRAFFIC . C  F  . 

SSP  -  SOURCE  SINK  PAIR  DATA . C  .  F 

CCS  -  CHECK  CONSISTENCY  .  I 

CCR  -  CHECK  CORRELATION . I.C.F 

SUM  -  SUMMARISE  MESSAGES . C  .  F  . 

LBM  -  LIST  BUS  MESSAGES . C  .  F  . 

LDD  -  UST  DIRECT  DATA . C  .  F  . 

CLD  -  CALCULATE  LOADINOS . C  .  F  . 

LOM  -  UST  ONE  MESSAGE . C  .  F  . 

LSS  “  UST  SUBSETS  ...........  .  .  .  C  .  F  . 

SRC  -  SET  RETRY  COOS . C  .  .  . 

ST  A  -  SET  TERMINAL  ADDRESS . C  .  .  . 

ROM  -  REORDER  MESSAGE . .  .  .  .  .  C  .  .  . 

hRM  -  REFORMAT  MESSAOES . C  .  .  . 

Flit  -  FILE  MESSAGES . C  .  F  . 

MS  -  DISMANTLE  SYSTEM . I  .  C 

MS  -  FORM  SYSTEM . C  .  .  . 

ISA  -  LIST  SUBADDRESSES . F  . 

FLS  -  FIIE  SCHEDULES . F  . 

UFS  -  UNFOBM  SYSTEM . F  . 

STOP  (return  to  OS) . E.  O.I.C.F. 


L  . 


L  . 
L  . 


L 


L  . 
L  . 


L  . 
L  . 
L  . 
L  . 
L  . 
L  . 
L 
L 
L 
L 

L  . 
L 


L  . 


LF 


LF 

LF 


LF 

LF 


Key  i  E  -  EMPTY 
0  -  OPEK 

I  -  INCONSISTENT 
C  -  CONFIGURED 
F  -  FORMED 
L  -  LIMITED 
LF  -  LIMITED  FORMED 


cm 


i: 


uis 


FVS 


LIMITED 

FORMED 


Key  to  oomnandai 

ITE  -  INPUT  TERMINAL  ENTRIES 
IFE  -  INPUT  FI1ED  ENTRIES 
IIV  -  INPUT  PILED  MESSAQES 
CIU  -  CLEAR  DATABASE 
CTS  -  CONPIOURE  SYSTEM 
EUS  -  DISMANTLE  SYSTEM 
res  -  FORM  SYSTEM 
UFS  -  UNFORM  SYSTEM 


12-1 


SIGNAL  PROCESSING  WITH  SYSTOLIC,  ARRAYS* 


R.W.  Priester 

Research  Triangle  Institute 
Research  Triangle  Park,  NC  27709 


K.  Bromley 

Naval  Ocean  Systems  Center 
Catalina  Boulevard 
San  Diego.  CA  92152 


H.J.  Whltehouse 
Naval  Ocean  Systems  Center 
Catalina  Boulevard 
San  Diego,  CA  92152 


J.b.  Clary 

Research  Triangle  Institute 
Research  Triangle  Park,  NC  27709 


ABSTRACT 

This  paper  discusses  the  application  of  systolic  array  processors  to  signal  processing  problems  that 
are  amenable  to  a  matrix  formulation.  Systolic  arrays  are  formed  by  providing  nearest-neighbor 
interconnections  between  a  large  number  of  elemental  processors  to  form  either  a  one-  or  two-dimensional 
array.  Witn  the  possible  exception  of  boundary  elements,  each  processing  element  performs  Identical 
computations  In  synchronism  with  other  elements  in  the  array.  A  number  of  Important  problems  for  which 
systolic  arrays  hold  potential  are  mentioned  and  the  systolic  array  processor  definition,  In  a  number  of 
its  forms,  is  reviewed.  When  applied  to  strongly  band-limited  matrices,  systolic  array  processors  can  be 
characterized  as  highly  efficient  from  the  standpoint  of  both  hardware  utilization  and  algorithm  time. 
However,  as  the  bandwidth  becomes  large,  this  high  performance  Is  degraded.  In  an  effort  to  overcome 
performance  degradation,  this  paper  Introduces  and  evaluates  a  data  transformation  which,  when  applied  to 
an  n  x  n  dense  matrix,  results  In  an  Improved  banded  structure  with  attendant  hardware  savings.  An 
interesting  feature  of  this  transform  Is  Its  invariance  properties  with  respect  to  the  ordering  of  output 
time  sequences  and  algorithm  execution  time.  Another  Interesting  aspect  Is  its  relation  to  the  classical 
Gauss-Seldel's  method  of  iteration. 

It  Is  shown  that  systolic  array  processors  possess  some  efficient  testability  features  vrtilch  can  be 
exploited  concurrently.  These  are  briefly  summarized. 


*The  wcx  reported  In  this  paper  was  sponsored  by  the  Naval  Ocean  Systems  Center,  San  Dieqo,  CA,  under 
contrac  N66001-80-C-0118. 


12-2 


1.0  INTRODUCTION 

This  paper  discusses  the  application  of  systolic  array  architectures  to  signal  processing  problems. 

Introduced  by  Rung  (197fl',  systolic  array  architectures  provide  the  capability  for  realizing  a  number 
of  Important  matrix  operatic  .s.  In  addition  to  achieving  a  high  computation  rate  by  means  of  pipelining 
and  concurrent  computation,  the  architecture  Is  a  good  candidate  for  Implementation  with  VLSI  (very  large 
scale  integration)  technology.  If  the  matrices  processed  are  characterized  by  a  narrow  bandwidth,  excel¬ 
lent  hardware  utilization  efficiency  can  be  achieved.  However,  In  those  cases  where  the  matrix  bandwidth 
becomes  appreciable,  for  instance  In  the  case  of  square  densely-populated  matrices,  hardware  utilization 
efficiency  1«  degraded  significantly.  This  paper  addresses  the  problem  of  using  systolic  arrays  to  process 
matrices  whose  structure  Is  less  constrained.  A  simple  but  effective  data  transform  which  can  In  some 
Instances  significantly  Improve  hardware  utilization  efficiency  is  Introduced  and  developed. 

The  paper  Is  organized  as  follows.  Section  2.0  presents  a  brief  and  general  discussion  of  several 
problem  areas  where  the  systolic  array  architecture  Is  of  Interest.  Section  3.0  outlines  the  main  features 
of  the  systolic  array  architecture  and  unly  summarizes  the  extensive  tr  atment  given  by  Rung  (1978)  and 
Mead  (1980);  this  section  Is  included  only  for  purposes  of  completeness  of  presentation.  The  PRT  (partial 
row  translation)  data  transform  Is  Introduced  and  developed  In  detail  In  Section  4.0.  Section  4.0  also 
quantitatively  compares  the  efficiency  of  the  original  systolic  array  processor  with  that  which  results 
from  applying  the  PRT  transform.  These  results  provide  a  means  for  deciding  when  PRT  Is  advantageous. 
Matrix  inversion  is  the  topic  of  Section  5.0  while  Section  6.0  briefly  outlines  an  efficient  technique  that 
Is  useful  Ter  testing  some  systolic  array  matrix  processors. 


2.0  MATRIX  OPERATIONS  IN  SIGNAL  PROCESSING  APPLICATIONS 

Matrix  operations  represent  a  significant  portion  of  the  computational  burden  encountered  In  many  sig¬ 
nal  processing  applications.  Adaptive  filtering,  data  compression,  beamfrrmlng,  and  cross-ambiguity  cal¬ 
culation  represent  problem  areas  vrfiere  stable  matrix  analysis  techniques  are  of  orrent  Interest.  In  terms 
of  resources  required  for  system  implementation,  these  problems  can  be  classified  as  memory  Intensive  and 
computation  Intensive.  Construction  of  systems  capable  of  providing  the  computations  required  for  analysis 
of  the  above  problems  must  provide  for  operations  such  as  matrix  multiplication.  Inversion,  addition  and 
various  decompositions. 

For  example.  In  least  squares  approximation  problems,  one  might  encounter  matrix  multiplication,  ma¬ 
trix  inversion,  and/or  singular  value  decomposition.  The  computational  approach  used  in  a  particular 
Instance  depends  upon  the  numerical  stability  properties  of  the  problem  at  hand.  For  Instance,  If  the 
order  of  a  particular  problem  Is  sufficiently  small,  the  Gauss  normal  equations  might  be  solved  by  perform¬ 
ing  a  straightforward  matrix  Inversion.  However,  In  the  solution  of  Ill-conditioned  systems  commonly  en¬ 
countered  In  large-scale  problems,  achieving  a  meaningful  solution  might  require  application  of  singular 
value  decomposition  computations. 

Spelser  and  Whitehouse  (1980)  discussed  the  signal  processing  problems  mentioned  above  and  considered 
the  applicability  of  competing  architectures  such  as  transversal  filters,  array  processors,  bus-organized 
multiprocessors  and  systolic  array  architectures.  Of  these,  the  most  promising  architecture  Is  that  of  the 
systolic  array  which  has  the  potential  to  support  real-time  Implementation  of  the  algorithms  required  In 
order  to  address  those  problem  areas  mentioned  In  this  section. 


3.0  THE  SYSTOLIC  ARRAY  ARCHITECTURE 

In  the  Interest  of  a  self-contained  presentation,  the  systolic  array  architecture  will  De  outlined  and 
Illusion ’d  in  this  section.  A  thorough,  comprehensive  treatment  can  be  found  In  Rung  (1978)  or  In  Mead 
(1980).  The  systolic  array  architecture  Is  founded  almost  exclusively  upon  a  single  computational 
element  -  the  inner  product  step  processor  -  which  implements  the  relation 

k+1  k 

y  *  ak+1  •  xk+1  +  y  :  k  *  0,  1,  2 . n-1.  (1) 

Systolic  array  processors  are  constructed  by  appropriately  Interconnecting  a  group  of  Inner  product  step 
processors.  In  the  systolic  array  architecture,  only  nearest-neighbor  processor  'onsiunlcatlon  Is 
permitted.  For  purposes  of  data  communication  and  computation,  each  Inner  oroduct  step  processor  is 
equl  ,»d  with  three  data  registers:  Ry  (for  y),  Ra  (for  )  and  Rx  (for  xk).  Each  register  has 

two  cuin  .c  ^r.  -  one  for  Input,  the  other  for  output.  Rung  (1978)  defined  two  types  of  Inner  product  step 
processors  which  .  -  Illustrated  In  Fig.  1.  These  elemental  processors  can  be  connected  In  a  number  of 
ways  which  provide  tne  capability  to  perform  various  matrix  operations  such  as  matrix  multiplication,  L-U 
decomposition  of  symmetric  positive-definite  matrices,  and  the  solution  of  triangular  linear  systems  or 
equations. 

A  basic  unit  of  time  measure  for  both  types  of  processors  shown  In  Fig.  1  Is  defined  as  Fellows:  (i) 

k  k+1 

the  processor  loads  Inputs  y  ,  x.  and  a. ,  Into  R  ,  R  ,  and  R.  respectively,  (b)  y  Is  computed  according 
k+,  xx  y  x  a 

to  (1),  and  (c)  y  ,  xk,  and  ak  are  output. 

As  an  example,  a  systolic  array  matrix-vector  processor  will  be  configured  to  form  the  product 

y  *  Ax  (2) 

using  a  linearly  connected  group  of  Type  1  processors.  The  relations  which  must  be  Implemented  are  as 
fol lows 


12-3 


k+1  k 

>1  k-O,  1.  2 . n-1 

y?  *  0  (3) 

*  y".  1  5  !•  2 . "• 

Fig.  2  illustrates  the  systolic  array  of  processors,  the  element  data  arrangements  and  flow  required  to 
evaluate  (2)  for  the  case  where  A  is  an  n  x  n  matrix  with  bandwidth  w  *  p  ♦  q  -  1*4.  The  y^  enter  the 
array  from  the  right  as  zero  and  accumulate  so  as  to  form  the  inner  product  of  the  1th  row  of  A  with  vector 
x  which  moves  to  the  right  after  being  Input  from  the  left.  As  the  x  and  y  vectors  move  through  the  arrqy 
in  the  manner  noted,  A  is  shifted  downward  such  that  elements  along  the  main  diagonal  pass  'through  P2- 
In  general  elements  of  A  above  and  parallel  to  the  main  diagonal  pass  through  processors  to  the  left  of 
Py-  Similarly  elements  of  A  below  and  parallel  to  the  main  diagonal  pass  through  processors  to  the  right' 
of  ?2-  A  detailed  example  illustrating  the  operation  of  this  systolic  array  matrix-vector  processor  will 
be  presented  in  Section  4.0. 

Generalization  of  the  linearly-connected  systolic  array  to  a  two-  dimensional  orthogonally-connected 
structure  enables  the  evaluation  of  matrix-matrix  products.  A  systolic  array  for  evaluating 

C  =  AB  (4) 

where  all  matrices  are  n  x  n  is  showi  in  Fig.  3.  Matrix  A  is  input  to  the  systolic  array  In  exactly  the 
same  way  as  described  earlier  for  the  matrix-vector  processor  while  columns  o.c  B  are  Input,  with  appropri¬ 
ate  spatial  shift  to  allow  for  A's  time  delay,  into  successive  rows  of  the  array.  If  B  contains  a  large 
number  of  columns  this  implementation  can  be  inefficient  even  for  strongly  banded  matrices.  Kung  (1978) 
overcame  this  problem  by  devising  the  hexagonal -connected  systolic  array  which  Is  based  upon  the  Type  2 
processor  of  Fig.  1.  An  example  of  this  processor  is  presented  in  Fig.  3  (bj  for  the  case  (4)  when  A,  B 
and  C  are  strongly  banded.  Note  the  direction  of  flow  and  orientation  of  A,  B  and  C.  Entries  in  C  are 
accumulated  as  this  matrix  is  shifted  upward  from  the  bottom  of  the  array,  where  the  c,j  enter  with 
zero  value.  J 

Using  the  array  structures  presented  above,  Kung  (1978)  was  able  to  realize  two  additional  Important 
matrix  operations.  Due  to  space  limitations,  these  only  will  be  mentioned  here.  A  triangle  equation 
solver  can  be  constructed  using  a  linearly  connected  array  of  inner  product  step  processors;  however,  it  is 
necessary  to  introduce  a  new  processor  capable  of  division.  The  resulting  processor  solves  a  nonsingular 
triangular  system  of  linear  equations  by  back-substitution.  Similarly,  by  adding  special  elements  on  the 
upper  portion  of  the  periphery  of  the  hexagonal  array  (Fiq.  3b),  Kung  (1978)  showed  that  one  can  obtain  the 
following  matrix  decomposition 


A  *  LU 

where  A  is  a  symmetric,  positive  definite  mat-ix 

L  is  lower  triangular  having  Is  on  the  main  diagonal 
and  U  is  upper  triangular. 

Therefore,  this  processor,  when  coupled  with  the  triangle  equation  solver,  can  be  used  to  solve  a  fairly 
general  class  of  simultaneous  equations. 

Table  1  summarizes  the  hardware  requirements  and  algorithm  execution  time  steps  for  the  family  of  sys¬ 
tolic  array  processors  defined  by  Kung.  When  considered  from  the  standpoint  of  hardware  uniformity,  a  sur¬ 
prising  degree  of  capability  is  realized  by  the  systolic  array  architecture.  For  the  case  of  strongly 
banded  matrix  structures,  this  architecture  is  efficient  in  terms  of  both  the  quantity  of  hardware  used  and 
in  hardware  utilization  efficiency.  However,  If  square  dense  matrices  or  matrices  of  more  general  struc¬ 
ture  are  considered,  hardware  utilization  efficiency  can  be  degraded  considerably.  This  problan  Is 

Table  1.  Summary  of  Systolic  Array  Hardware  and  Algorithm  Execution  Time  Requirements 

for  Some  Matrix  Problems. 


Systolic  Array 

Problem 

No  of  Processors 

Algorithm 

Configuration 

Solved 

Required 

Time 

Note;  (a) 

Matrices  are 
assumed  n  x  n 
with  bandwldths 

Linearly  Connected 

Matri x-Vector 

w 

2n  +  w 

w  *  p  ♦  q  -  1. 

Array 

Multiplication 

Subscripted  w 
denotes 

Linearly  Connected 

Solution  of 

w 

2n  +  w 

bandwidth  of 

Array 

Triangular 

Indicated 

System 

matrix. 

Orthogonal ly 

Matrix-Matrix 

n  •  min(wA,Wg) 

3n  +  min(wA,wB) 

(b) 

Matrix -Matrix 

Connected  Array 

Multiplication 

Hiltlpl  Icatlon 
either  C  «  AB  or 

Hexagonal ly 

Matrix-Matrix 

"a"b 

3n  +  min(wA,Wg) 

C’  ■  A’B' ,  where 

Connected  Array 

Mult :pl icatlon 

(' )  •  transposi¬ 
tion. 

Modified 

L-U  Decomposition 

p(q-l) 

3n  +  mln(p.q) 

Hexagonal ly 
Connected  Array 

A  »  LU 

12-4 


addressed  In  the  next  two  sections  of  this  paper  wrere  methods  for  improving  Implementation  efficiency  are 
Introduced  and  studied. 


4.0  DEFINITION  AND  DEVELOPMENT  OF  THE  PT.T  TRAM5FJRM 

In  this  section  the  PRT  (partial  row  translation)  transform  will  be  defined  and  some  of  the  benefits 
available  from  Its  application  In  connection  with  systolic  arrays  will  be  presented.  It  will  be  shown  to 
improve  hardware  utilization  efficiency  and  In  addition  provide  a  hardware  savings  In  the  case  of  square 
dense  matrices. 

Definition  of  the  PRT  Transform 


Consider  the  matrix-vector  multiplication  problem  stated  In  (2)  with  A  constrained  tc  be  n  x  n  and 
densely  populated.  Express  A  as  a  strictly  subdiagonal  part.  A,  (i.e.  with  no  diagonal  elements) 
juxtaposed  with  Ay,  the  upper  triangular  part  of  A  which  contains  the  ma  n  diagonal  elsmsnts  of  A.  Tnls 
mqy  be  expressed  as  follows 


Applying  the  PRT  transform  to  (5)  provides 


That  is,  ApRT  Is  obtained  from  A  simply  by  translating  (1-1)  elements  In  row  1  to  the  right  n  posi¬ 
tions  within  the  row  for  1  »  2,  3,  ....  n.  In  the  resulting  n  x  (2n  -  1)  array,  all  elements  not  specified 
by  Ay  and  the  displaced  A,  ere  set  to  zero.  Now,  applying  the  PRT  transform  to  (2)  yields  the  equiva¬ 
lent  expression 


y  »  ApRT  XpRT  ■  ApR7 


(7) 


where  xp  ■  (xj,  Xj,  ...,  xn_j).  It  Is  noted  that  the  PRT  converts  a  square  array  Into  a  non-square  array 

with  enhanced  banded  structure.  The  transform  necessitates  augmenting  x  with  a  partial  copy,  x  .  A  de¬ 
tailed  example  where  A  Is  4  x  4  Is  shown  In  Fig.  4.  Four  processors  are  used  and  the  required  Kunber  of 
time  steps  Is  eleven.  These  quantities  compare  favorably  with  Rung's  systolic  array  which  would  use  seven 
processors  and  also  eleven  time  steps.  For  n  large.  It  follows  that  the  PRT  transform  saves  about  n/2 
Inner  product  step  processors  with  no  Increase  In  execution  time.  If  the  original  systolic  array  were 
designed  such  that  Immediately  upon  processing  element  a_n,  the  values  of  y  contained  In  the  array 
could  be  unloaded,  a  time  advantage  would  result  for  this  processor  configuration.  The  corresponding  PRT 
based  array,  while  saving  about  one-half  the  number  of  processors,  would  Incur  only  about  a  50*  Increase  In 
execution  time. 


The  PRT  transform  readily  extends  to  the  problem  of  evaluating  the  product  of  two  square  matrices  as 
expressed  In  (4).  It  can  be  shown  that  the  resulting  systolic  array  for  this  problem  is  Identical  to  that 
of  Fig.  3a.  The  only  difference  occurs  In  the  way  A  and  B  are  Input  to  the  array.  The  PRT  Is  applied  to 

A  xblch  saves  about  n  /2  processors  and  the  columns  of  B,  Input  on  the  left  side  of  the  array  are  par¬ 
tially  repeated  as  prescribed  In  (7).  Due  to  the  large  number  of  connections  which  would  be  required  to 
Immediately  imload  this  two  dimensional  array,  the  PRT  configured  processor  will  evaluate  the  matrix-matrix 
product  without  any  time  penalty  compared  with  the  original  systolic  array. 

Although  they  will  not  be  discussed  here,  the  PRT  transform  can  be  advantageously  applied  to  some 
problems  where  non-square  matrices  are  encountered. 

Quantitative  Assessment  of  the  PRT  Transform 

The  remainder  of  this  section  will  be  devoted  to  a  quantitative  comparison  of  the  performance  of  the 
systolic  array  processir  proposed  by  Rung  (1978)  (hereafter  called  original  and  denoted  In  certain 
Instances  by  the  subscript  orlg)  with  that  of  the  PRT  based  structure  (henceforth  called  alternate  and 
denoted  by  subscript  alt).  The  comparisons  to  be  made  will  be  based  upon  the  following  three  figures  of 
merit: 

(a)  Processor  utilization  efficiency  vor1g  and  ea1t. 

(b)  Space -Time  product  (ST)pr1g  and  (ST)#1t  where 

S  *  number  of  inner  product  step  processors 
T  ■  number  of  algorithm  time  steps. 

(c)  Overall  figure  of  merit  F  *  ’/(ST),  Q  ■  Pjjt^orlg’ 


In  the  comparisons  which  follow,  no  penalty  or  cost  Is  assigned  to  Implementing  the  PRT  transform.  Also  It 
1$  assumed  that  n  Is  large. 

First  consider  the  matrix-vector  problem  which  Is  shown  for  both  processor  configurations  In  Fig.  5. 
Adjacent  to  each  processor  configuration  expressions  for  n,  S,  and  T  are  given,  n  Is  defined  as  the  ratio 
of  active  area  to  the  total  area  as  shown  In  the  figure.  Simply  stated  It  Is  an  approximate  measure  of  the 


12-5 

proportion  of  algorithm  tine  for  which  computations  are  performed.  Only  square  matrices  are  considered 
here  with  bandwidth  w  *  p  ♦  q  *  1.  Note  also  that  the  comparisons  made  here  assume  processor 
Initialization  as  Illustrated. 

Fig.  6  presents  plots  of  e  as  a  function  of  the  normalized  bandwidth  parameters  y  *  p/n  and  x  *  q/n. 
This  figure  Is  drawn  under  the  assumption  that  the  array  of  the  original  configuration  may  be  unloaded 
Immediately  after  element  a^  has  been  processed.  Alternately,  Fig.  7  presents  the  same  Information 

except  '.nat  Inmedlate  unloading  of  the  original  configuration  Is  not  allowed.  The  results  show  that  the 
capability  to  Immediately  imload  the  array  Is  Important  when  x,  y — *1.0.  Note  that  the  original 
configuration  provides  excellent  efficiency  for  x  and  y  both  small,  that  Is,  for  strongly  banded  matrices; 
however,  as  x,  y- — *1.0  the  alternate  form  Is  superior. 

Now  consider  a  comparison  on  the  basis  of  (ST)  product.  Solving  the  relation  (ST)org  *  (ST)alt 

provides  the  result  plotted  In  Fig.  8.  Nhen  the  pair  (x,y)  lie  above  the  curve,  the  alternate  configura¬ 
tion  provides  a  smaller  (ST)  product. 

Generally  It  will  be  desirable  to  maximize  the  quantity  F  *  ’/(ST)  for  a  given  problem.  Therefore, 
Fig.  9  shows  a  plot  of  Q  *  Fa|t/Forg  versus  y  with  x  a  paraneter.  Given  x  and  y  for  a  particular  problem 

these  results  clearly  Indicate  the  preferred  processor  configuration. 

Attention  Is  now  directed  to  the  matrix  multiplication  problem  where  It  Is  required  to  evaluate  C  *  Ab 
when  both  A  and  8  are  n  x  n  dense  matrices.  For  the  sake  of  simplicity,  the  general  case  of  banded 
matrices  will  not  be  treated  In  this  comparison.  Three  systolic  array  configurations  will  be  considered. 

(a,  A  PRT-based  orthogonally-connected  processor 

lb)  The  orthogonally-connected  processor  shown  In  Fig.  3(»). 

(c)  The  hex-connected  processor  presented  in  Fig.  3(b). 

The  quantities  of  Interest  for  comparing  these  three  configurations  (subsequently  referred  to  as 
configuration  (a),  (b)  and  (c))  are  tabulated  In  Table  2.  (Note  In  Table  2  that  the  double  subscript  on  Q 
Is  Interpreted  to  mean  Qab  *  F a/Fb  where  a  and  b  refer  to  the  configurations  listed  above).  From  these 

results  the  PRT-based  systolic  array  Is  seen  to  offer  significant  performance  advantages  with  respect  to 
configurations  (b)  and  (c)  under  the  conditions  specified. 


Table  2.  Comparison  of  Systolic  Array  Configurations  for 

Matrix-Matrix  Multiplication  (all  matrices  n  x  n). 


Quantity  of 
Interest 


Configuration 

_ Ul _ 


Conf  Igjration 
(b) 


Configuration 
_ (c) _ 


T 

5n 

5n 

An 

S 

n2 

2n2 

4n2 

2/3 

1/2 

1/8 

Q*  «  27 

<>ac  “  17 


5.0  APPLICATIONS  OF  SYSTOLIC  ARRAYS  TO  MATRIX  INVERSION 

This  section  will  consider  both  explicit  and  Implicit  methods  for  solving  a  given  consistent  set  of 
linear  equations.  By  explicit  it  Is  meant  that  the  Inverse  matrix  Is  made  available  to  the  user  while 
Implicit  Is  used  to  Imply  that  only  the  solution  vector  is  determined  and  made  available. 

The  hexagonally  connected  systolic  array  mentioned  earlier  can  be  used  to  explicitly  Invert  a  given 
symmetric,  positive-definite  matrix.  The  approach  Is  discussed  by  Spclser  and  Mhltehouse  (1980)  and  can  be 
summarized  as  follows.  First  the  L-U  decomposition  of  the  given  matrix  Is  formed  using  the  h ox -connected 

systolic  array.  Then  using  n  appropriately  Interconnected  triangle  equation  solvers,  L'1  can  be 
computed.  In  this  step  the  Input  to  the  array  of  triangle  equation  solvers,  l.e.  the  known  Input  vectors 

taken  collectively,  forms  the  Identity  matrix.  U*1  is  computed  In  a  similar  manner,  and  finally  the 

Inverse  matrix  Is  obtained  by  taking  the  matrix  product  U_1L"5 .  All  of  these  steps  can  be 
Incremented  using  systolic  arrays. 

Implicit  matrix  Inversion  can  be  performed  In  several  ways,  the  most  direct  consisting  of  L-U 
decomposition  followed  by  two  executions  using  a  triangle  equation  solver.  That  Is,  given 

Ax  *  b,  A  and  b  known 

LUx  •  b:  LU  decomposition  step 

Ly  «  b:  solve  fur  y  using  triangle  equation  solver 

Ux  ■  y:  solve  for  x  using  triangle  equation  solver 

This  method,  vtolle  It  does  not  explicitly  provide  A*1  is  generally  more  accurate  than  the  explicit 


12-6 


method  *h1ch  computes  x  *  A*  b  *  J*  L*  b,  Swweh  (1977).  Other  implicit  techniques  such  as  Jacobi's  method, 
Gauss-Seldel '$  method  and  the  successive  overrelaxation  (SOR)  method,  as  discussed  by  Dahlqulst  (1974),  can 
be  realized  with  systolic  arrays.  Implementation  of  Gauss-Seldel's  method  Is  Interesting  because  It  Is 
closely  related  to  the  PUT  transform.  Consider  the  equation  Ax  ■  b.  Factoring  A  Into  the  form 
A  ■  0(L  +  I  +  U)  where  l  and  U  are  strictly  lower  and  upper  triangular  matrices  respectively  (l.e.,  their 
main  diagonal  elements  are  zero)  and  0  Is  a  diagonal  matrix  0  •  dlag(a^),  a^  »*  0,  1-1,2,  ....  n. 

Jacobi's  method  of  Iteration  can  be  written  In  terms  of  these  definitions  as  follows 

x{+1  *  (-L1xlt  -  U,xk)  +  bj/a^,  1  ■  1,  2 . n  (8) 

where  L<  and  U^  denote  the  1th  rows  of  L  and  U  respectively.  Implementation  of  (8)  using  either  the 
original  or  alternate  forms  for  systolic  array  matrix-vector  multiplication  Is  straightforward,  only 
requiring  Insertion  of  zeros  along  the  main  diagonal  and  evaluation  of  the  terms  b^/a^t  outside  the 
array  as  an  auxiliary  computation.  The  equations  defining  Gauss-Seldel's  method  are  as  follows 

xk+1  «  (-L1xk+1  -  UjXk)  +  b^a^.  1  ■  1,  2 . n.  (9) 

k+1  lt+1 

Here  the  notation  Is  Identical  to  that  In  (8)  except  that  In  the  term  L.x  ,  x  represents  only  a 

partially  filled  vector  (x^  ,  x2  ,  ....  x^,  0.  ••■)  which  Is  "built 'up"  as  the  computation  proceeds. 

Gauss-Seldel's  Iteration  can  be  Implemented  In  systolic  array  form  by  using  the  PUT  transform.  This  Is 
Illustrated  In  Fig.  10  which  shews  that  the  diagonal  elements  have  been  omitted  and  the  terms  b^/a^  are 

evaluated  outside  the  array.  Assuming  that  the  computation  Is  started  with  an  Initial  estimate  xk.  It  can 

be  observed  from  Fig.  10  that  xk+1  will  be  output  and  available  for  processing  by  the  strictly  Stitdlagonal 

elements  L.  (For  a  detailed  example  of  this  property  see  F<g.  4  and  note  that  In  the  present  case 
k  +1 

x1  *  y^.  Is  output  at  time  step  5.  Note  also  that  this  value  of  y^ls  required  In  time  step  6  for 
processing  by  a^,  which  In  the  present  case  Is  Lj).  Since  U  always  processes  a  backdated  estimate.  It  can 

be  seen  that  the  PRT  transform,  or  some  equivalent  method,  must  be  applied  In  order  to  realize 
Gauss-Seldel's  method  using  systolic  arrays.  That  Is,  imless  the  elements  of  L  can  be  moved  to  the  Input 

side  of  the  array  where  the  xk+1  are  Input,  the  pipelining  effect  of  the  array  prohibits  implementing 

Gauss-Seldel's  method.  Therefore,  the  original  form  of  the  systolic  arrqy  cannot,  without  modification,  be 
used  to  Implement  Gauss-Seldel's  Iterative  method. 

Note  from  Fig.  10  that  Gauss-Seldel's  Implementation  can  provide  extremely  efficient  utilization  of 
processor  capability.  Processor  utilization  efficiency,  starting  at  83*,  monotlcally  Increases  toward  100* 
as  the  nun*>er  of  Iterations  Increase.  Although  not  discussed  earlier  when  matrix-vector  processors  were 
considered,  a  form  similar  to  that  shown  In  Fig.  10  can  be  obtained  for  the  problan  y  =  Ax  where  A  is  n  x  m 
with  n  2  m.  For  this  case.  Input  vector  x  is  simply  repeated  the  required  number  of  times  while  the  PRT 
transform  Is  applied  to  successive  m  x  m  partitions  of  A. 

the  SOR  method  of  solution  by  Iteration  Is  very  similar  to  Gauss-Seldel's  method,  the  most  Important 
distinction  being  that  the  systolic  array  In  this  esse  computes  the  residual  error  which  Is  then  weighed  by 
a  relaxation  parameter  appropriately  chosen  to  accelerate  convergence. 


6.0  CONCURRENT  TESTING  OF  SYSTOLIC  ARRAY  PROCESSORS 

Utilization  of  any  functional  device  In  realizing  Important  system  features  ultimately  leads  to 
questions  regarding  reliability  and  maintainability  properties.  In  this  section  Interesting  methods  for 
externally  testing  systolic  arrays  for  proper  operation  will  be  considered.  It  is  not  practical  to 
consider  reliability  features  here;  therefore,  only  issues  related  to  maintainability,  namely  testability, 
will  be  considered.  Only  external  methods  for  testing  will  be  explored. 

Consider  the  systolic  array  for  performing  a  matrix-vector  product  orlqinally  proposed  by  Kung  (1978). 
Given  the  way  In  which  the  matrix  rows  pass  through  the  processor  array,  a  rather  simple  external  test  for 
proper  operation  of  the  array  would  be  to  augment  the  qlven  matrix  by  adding  two  check  rows  -  one  at  the 
top  and  another  at  the  bottom.  This  Is  Illustrated  In  Fig.  11  where  the  two  additional  rows  must  be 
Identical  In  order  to  facilitate  the  check.  Note  from  Fig.  11  that  If  no  x.  »  0  and  no  augmentation 
element  is  zero,  each  processor  will  be  checked  In  the  process  of  performing  the  matrix-vector  product. 

The  test  Is  very  simple  since  it  requires  only  that  y^  be  compared  for  equality  with  yn+2. 

Two  additional  processors  are  required  to  realize  this  test.  It  Is  interesting  to  examine  the  cost 
required  to  Implement  this  check  In  terms  of  added  hardware  and  algorithm  execution  time.  Let  S  represent 
the  hardware  required  to  realize  a  processor  In  the  array  And  t  denote  the  time  Interval  required  for  each 
shift  in  passing  the  matrix  through  the  processor.  For  an  n  x  n  dense  matrix  and  using  the  p-oduct  S» 
(computation  time)  as  a  measure  of  resources  used,  then  the  efficiency  Is  given  by: 

(S  *  2nt)  without  test 
a  = - 

[S(2n+2)t]  with  test 


i  “  1  -  2/n 


For  n  larqe,  It  follows  that  this  Is  a  very  efficient  test  In  terms  of  required  resources. 

With  respect  to  test  effectiveness,  however,  questions  follow  with  regard  to  fault  coverage.  If  x  Is 
known  to  be  dense  and  the  augmentation  does  not  use  zero  elements,  the  test  will  be  good  for  detecting  hard 
failures.  However,  transient  failures  represent  a  problem  for  this  approach. 

The  test  method  just  described  can  be  applied  to  matrix-matrix  processors,  although  comparison  of  more 
quantities  Is  required.  It  also  follows  that  this  approach  Is  applicable  to  the  PRT  transform.  Note  for 
this  case  from  Fig.  11,  however,  that  for  about  n  time  steps  no  checks  on  the  computation  are  performed. 
This  can  be  overcome  by  additional  augmentations,  appropriately  Interspersed,  in  the  original  matrix. 


7.0  CONCLUSION 

Systolic  arrays  represent  a  potentially  Important  means  for  Implementing  computations  Involving 
large-scale  matrices.  The  realization  of  a  general  matrix-oriented  computing  capability  that  Is  founded 
upon  a  few  standard  modules  using  VLSI  technology  Is  appealing.  However,  as  emphasized  by  Kung  (1978), 
minimization  of  wiring  requirements  (eonmunlcefion  costs)  is  a  central  problem  In  this  technology.  The  PRT 
transform  Introduced  in  this  paper  can  significantly  reduce  these  costs  for  some  problems.  Of  particular 
Importance  is  the  fact  that  these  savings  can  be  realized  In  some  cases  without  Increasing  algorithm  time. 

It  has  been  shown  that  for  n  x  n  banded  matrices  the  PRT-based  systolic  array  and  that  originally 
proposed  by  Kung  (1978)  are  complimentary  In  the  sense  that  when  one  Is  efficient,  the  other  form  tends 
toward  lower  efficiency.  The  PRT  transform  does  not  alter  the  original  systolic  array  hardware  definition. 
The  time-ordered  outputs  are  Invariant  under  this  transform  -  the  only  changes  appearing  In  the  order  of 
accumulation  of  Intermediate  values  before  they  are  output  at  the  array  port(s). 

Solution  of  linear,  simultaneous  equations  by  iteration  methods  using  systolic  arrays  results  in  an 
interesting  Interpretation  of  the  PRT  transform.  The  PRT  or  some  equivalent  transform  appears  necessary  in 
order  to  apply  systolic  arrays  to  Gauss -Seidel 's  method  or  to  the  SOR  method. 

A  simple,  efficient  -  though  somewhat  limited  -  testing  technique  was  Introduced  for  performing 
external  concurrent  tests  on  systolic  arrays.  This  topic,  as  well  as  the  others  considered  in  this  paper. 
Is  worthy  of  further  study. 


REFERENCES 

Dahlquist,  G.  and  A.  Bjorck,  1974,  Numerical  Methods,  Prentice  Hall,  Inc.,  New  Jersey. 

Kung,  H.T.  and  C.E.  Lelserson,  (1978),  "Systolic  Arrqys  for  (VLSI),"  Carnegl e-Mel  Ion  University, 
Pittsburgh,  Pa.,  (last  revised  December,  1978). 

Mead,  C.,  and  L.  Conway,  (1980),  Introduction  to  VLSI  Systems,  Addlson-Wesley,  Reading,  Massachusetts. 

Sameh,  A.H.,  1977,  "Numerical  Parallel  Algor 1thms--A  Survey,"  In  High  Speed  Computer  and  Algorithm 
Organization,  editors  Kuck,  Lawrle,  and  Sameh,  Academic  Press,  NY. 

Speiser,  J.M.  and  H.J.  Whltehouse,  (1980),  "Architectures  for  Real-Time  Matrix  Operations,"  CiOMAC 
(Government  Microcircuit  Applications  Conference)  Digest  of  Papers,  Houston,  Texas. 


i 


n-to 


12-11 


y 

Fig.  7.  Processor  Utilization  Efficiency  Versus 

Matrix  Bandwidth  Without  Immediate  Unloading 
Capab<" Ity. 


X 


Fig.  8.  Values  :f  x,y  Which  Satiny  the  Equality 
(ST)or<q  -  fsr)dH  Without  lnmedlate 

Unloading  Capability. 


y 


OUTPUT 


Fig.  10.  Three  Iterations  of  Gauss-Seldel's 
Method  on  a  Systolic  Array  with 
External  Computations  Performed 
In  Block  Labeled  7 

(Initial  Estimate  »  x^l. 


SZSH  Tmmi  w"*  MimmewT  inwut  hatwix 

Tht  iHfLMtNTATiow:  Co*»A*t  r,  WITH  rB+,  row  teuAurr 

Fig.  11.  Concurrent  Testing  of  Systolic 
Array  Matrix- Vector  Processor  by 
Au<pnentat1on  Method. 


Fig.  9.  Figure  of  Merit  Q  -  Fa1t/For1g  Without 
Inmedlate  Unloading  Capability. 


13-1 


ECONOMIC  CONSIDERATIONS  FOR  REAL-TIME  NAVAL  AIRCRAFT/ 

AVIONIC  DISTRIBUTED  COMPUTER  CONTROL  SYSTEMS 

BERNARD  A.  ZEMPOLICH 

DEPUTY  TECHNOLOGY  ADMINISTRATOR  FOR  COMMAND,  CONTROL  AND  GUIDANCE 
RESEARCH  AND  TECHNOLOGY  GROUP 
NAVAL  AIR  SYSTEMS  COMMAND,  WASHINGTON,  D.  C. 

20361 

SUMMARY 

Using  naval  aircraft/avionic  -ystems  as  examples,  economic  considerations  for  Dlstrlbu  Computer  Control 
Systems  (DCCS)  are  discussed.  Centralized,  distributed  an'  federated  processing  architectures  ..re  used  as 
the  primary  set  of  systems  alternatives  fror,  which  econom;.  factors  are  developed.  Technical,  schedule 
and  financial  risks  for  the  system  architectures  are  presented.  Standardization  of  computer  h  ~<ware  and 
software  Is  examined  from  the  economic  viewpoint  and  other  related  risk  factors.  The  economic  Impact  of 
subsequent  logistic  support  for  standardized  computer  hardware  and  software  versus  non-standard  products 
Is  Identified.  System  considerations  such  as  rellaolllty,  maintainability,  availability,  built-in-test, 
fault  tolerance,  and  redundancy  are  examined  from  the  standpoint  of  resources  available  to  design  and 
develop  the  DCCS,  and  also  from  the  viewpoint  of  economic  Impact  of  failure  of  the  DCCS  to  perform  as 
expected.  The  economic  Impact  of  external  factors  such  as  the  rate  of  technology  advancement,  technology 
Independence,  limited  production  runs,  and  the  general  lack  of  economic  leverage  upon  the  market  are 
examined  and  related  to  the  life-cycle  support  requirements  of  the  DCCS. 

1.  INTRODUCTION 

The  inherent  nature  of  microelectronic  circuitry  Is  that  It  lends  Itself  to  digital  design  techniques  with 
ease.  This  attribute,  coupled  with  the  microprocessor,  provides  a  powerful  design  capability  to  developers 
of  aircraft/avionic  systems.  Today,  powerful  microcomputers  can  be  embedded  directly  Into  each  aircraft/ 
avionic  subsystem  with  little  if  any  in, pact  on  weight  and  volume.  With  this  caoablllty,  top-down  struc¬ 
tured  aircraft/avionic  systems  based  cn  distributed  processing  and  architectures,  have  become  Implementable 
and  cost  effective. 

This  paper  addresses  economic  considerations  associated  with  the  design  of  rea’-tlme  aircraft/avionic 
Distributed  Computer  Control  Systems  (DCCS)  for  future  Naval  aviation  Avionic  System  Architectures.  The 
aircraft  as  a  DCCS  Is  examined  based  on  stated  aircraft  mission  and  avionic  system  requirements.  DCCS 
design  options  and  alternatives  such  as  physical  Implementations  and  alternative  processing  architectures, 
standardization,  commonality,  reliability,  maintainability,  and  availability  are  analyzed  from  the  economic 
viewpoint. 

In  addition  to  the  options  and  alternatives  available  to  the  developing  activity,  there  are  many  external 
design  factors  which  affect  the  design  for  an  Avionic  System  Architecture  over  which  the  developer  has 
little  or  no  control.  Among  them  are  the  rate  of  technology  advancement,  technology  dependence  and 
Independence,  and  the  general  lack  of  economic  leverage  by  the  developers  over  the  products  of  the  solid 
state  industry.  These  factors  are  addressed  from  the  viewpoint  that  management  must  be  aware  of  their 
potential  Impact  on  the  design  of  a  DCCS. 

2.  ECONOMIC  CONSIDERATIONS 

2.1  DCCS  SYSTEM  DESIGN  OPTIONS  AND  ALTERNATIVES 

As  with  most  engineering  efforts,  the  design  of  an  aircraft  DCCS  allows  the  developer  to  exercise  a  number 
of  options,  all  which  have  inter-related  technical,  schedule,  and  economic  (cost)  risk.  DCCS  design 
options  and  alternatives  generally  fall  into  two  categories— those  factors  over  which  the  designer  has 
direct  control  and  those  factors  over  which  there  is  little  or  no  control  by  the  developing  activity. 

DCCS  considerations  over  which  the  developer  has  control  Include:  physical  Implementations;  alternative 
processing  architectures;  standardization  and  commonality;  and  reliability,  maintainability,  and  avail¬ 
ability.  Factors  over  which  the  developer  has  little  or  no  control  are  all  economic  In  nature.  Among 
these  external  considerations  which  Impact  the  design  of  a  DCCS  are  the  following:  the  rate  of  technology 
advancement,  technology  Independence,  and  lack  of  economic  leverage  in  the  marketplace. 

2.2  PHYSICAL  IMPLEMENTATIONS 

As  stated  previously,  once  the  primary  mission  for  an  aircraft  is  established,  the  Avionic  System 
Architecture  can  te  decomposed  into  functional  requirements.  In  a  similar  fashion,  subsystems  can  be 
partitioned  Into  various  physical  Implementations.  There  are  three  basic  equipment  physical  Implementa¬ 
tion  alternatives:  the  "black  box"  approach,  the  form,  fit  and  function  (3F)  approach,  and  the  integrated 
technologies  concept. 

With  the  black  box  approach,  all  equipment  procurements  over  the  life-cycle  of  the  aircraft  are  bought  to 
a  set  of  specifications  which  detail  not  only  the  function  and  form,  but  also  the  internal  configuration- 
electronic,  electromechanical,  and  packaging.  Once  the  desired  performance  of  the  production  units  Is 
established,  subsequent  procurements  usually  have  minimum  technical  and  schedule  risk.  Quantity  of  units 
to  be  bought  per  unit  of  time  Is  the  dominant  economic  factor  with  procurement  of  black  box  implementa¬ 
tions  of  avionic  equipments.  Multiple  suppliers  can  also  be  considered  a  major  force  In  price  determina¬ 
tion  as  the  competitive  atmosphere  tends  to  keep  the  per  unit  cost  of  the  equipment  down.  The  assumption 
here,  of  course,  Is  that  alternate  sources  have  the  capability  to  produce  the  equipment  with  no  technical 
and/or  cost  problems.  Lastly,  with  the  black  box  approach,  long-term  logistic  considerations  (which  have 
a  great  Impact  on  the  life-cycle  costs  of  the  aircraft)  can  be  established  after  the  equipment  reaches 
production  maturity. 


1 3-: 


A  second  physical  Implementation  alternative  for  avionic  equipments  Is  that  of  form,  fit  and  function 
(3F).  With  the  3F  approach,  procurements  of  equipments  are  made  based  on  a  set  of  specifications  which 
detail  the  required  physical  dimensions  as  well  as  the  electronic  and  electromechanical  Interfaces.  The 
technologies  of  the  assemblies  within  the  unit,  on  the  other  hand,  are  allowed  to  vary  or  "float".  The 
economic  value  of  the  3F  approach  rests  mainly  with  the  options  open  to  the  supplier  In  having  to  meet 
only  the  3F  specifications.  In  essence,  the  stmoller  Is  free  to  make  maximum  use  of  his  particular 
resources,  design  approaches,  and  manufacturing  facilities.  It  Is  normal  to  expect  that  there  Is  the 
potential  for  cost  savings  through  the  use  of  the  3F  approach  In  that  It  permits  more  suppliers  to  bid. 
However,  there  Is  an  economic  shortcoming  of  the  3F  approach  In  that  1‘  does  not  readily  lend  Itself  to 
long-term  logistic  gains  and  planning.  This  shortcoming  may  be  minimized  If  the  alternative  supply 
source  were  to  use  components  and/or  parts  already  In  the  customer's  Inventory. 

In  both  the  black  box  and  3F  approach,  each  avionic  uni*  performs  a  fixed  specific  function.  At  the 
other  end  of  the  spectrum,  the  Avionic  System  Architect -re  can  be  partitioned  along  the  lines  of  Inte¬ 
grated  technologies  In  which  functions  are  performed  by  generic  task  areas  such  as  data  processing, 
conmunl cations,  navigation,  or  controls  and  displays.  In  this  Instan-c,  advanced  technologies  are  used 
In  an  Integrated  fashion  such  that  any  one  given  part  of  the  subsystem  1*.  ccpable  of  performing  different 
functions  at  different  times.  Specifically,  with  the  Integrated  technologies  Implementation,  the  func¬ 
tional  elements  are  all  electronic?  ly  reconfigured e.  While  this  concept  has  considerable  potential 
performance  and  economic  merit.  It  has  yet  to  be  fully  exploited  In  avionic  applications,  and  thus  the 
risks  are  not  yet  well  established. 

Regardless  of  the  alternative  chosen,  the  selection  of  the  physical  Implementations  of  aircraft/avionic 
equlpment(s)  Is  a  fundamental  design  decision  which  has  major  technical  and  management  Impact  during  the 
development  phase  as  well  as  during  the  operational  life  of  the  aircraft.  For  this  decision  dictates 
life-cycle  logistic  support  approaches  for  the  system  such  as  depot  repair,  module  "throw-away"  concepts, 
or  factory  repair  and  maintenance. 

If  the  decision  regarding  which  physical  Implementation  alternatives  should  be  selected  could  be  made  on 
the  considerations  just  addressed,  the  choice  Is  reduced  solely  to  a  comparison  of  risks.  Unfortunately, 
the  choice  Is  also  dependent  to  a  large  degree  on  the  proposed  aircraft  Installation.  Specifically,  Is 
the  Installation  of  the  DCCS  to  be  made  In  an  existing  operational  aircraft  as  opposed  to  an  Installation 
In  a  new  airframe?  With  a  new  airframe,  the  weight,  volume,  and  location  of  the  equipment  Is  normally 
determined  concurrently  with  the  development  of  the  aircraft,  thus  there  Is  a  degree  of  design  latitude 
allowed  In  the  physical  Integration  of  the  aircraft/avionic  subsystem.  On  the  other  hand,  with  an 
existing  airframe,  there  are  a  number  of  significant  restrictions  on  the  Installation  of  a  newly  designed 
DCCS  because  of  the  need  to  conform  to  existing  physical  conditions. 

The  Importance  of  Installation  options  cannot  be  overstated.  Restrictions  that  may  have  to  be  faced  when 
Installing  equipment  Into  existing  aircraft  may  very  well  prevent  an  optimal  combination  of  airframe  and 
on-board  aircraft/avionic  subsystems  from  a  logistic  viewpoint.  Needless  to  say,  logistics  considerations 
are  for  all  practical  purposes  economic  considerations,  and  If  experience  to  date  Is  any  measure,  the 
costs  of  lifetime  logistical  support  far  exceeds  the  non-recurring  development  costs. 

2.3  ALTERNATIVE  PROCESSING  ARCHITECTURES 

The  modern  aircraft/avionic  DCCS  will  be  required  to  handle  a  wide  variety  of  tasks  ranging  from  complex, 
high  speed  signal  processing  to  simple  input/output  formatting  and  control.  Additionally,  fault- tolerance 
concepts  demand  that  many  of  the  processing  elements  within  the  DCCS  be  capable  of  reprogramming  during 
the  operational  mission.  The  overall  processing  architecture  must  therefore  support  the  synchronization, 
control,  configuration,  reconfiguration,  and  fault-detection  of  all  processors  In  the  DCCS.  Furthermore, 
to  minimize  architectural  problems,  both  the  hardware  and  the  software  must  be  functionally  partitioned 
In  such  a  manner  that  the  Interface  complexity  Is  manageable,  and  the  design  and  implementation  of  each 
unit  processor  Is  maintained  In  as  Independent  a  manner  as  Is  possible. 

There  exists  a  variety  of  processing  architectures  which  can  be  utilized  to  design  an  aircraft/avionic 
DCCS  with  the  performance  capabilities  just  Identified.  It  should  be  noted,  however,  that  each  alterna¬ 
tive  has  attached  to  Its  use  a  unique  set  of  technical,  schedule,  and  financial  risk  factors.  Figure  1, 
Processing  Architecture  Alternative  Comparison,  lists  a  number  of  available  processing  architecture 
options  and  Identifies  the  associated  risk  factors.  Risks  are  stated  In  low,  medium,  and  high  terms 
because  there  does  not  exist  a  statistical  data  base  from  which  precise  numerical  values  can  be  derived. 

Unfortunately,  the  procedure  for  selecting  a  specific  processing  architecture  Is  not  solely  a  matter  of 
looking  at  the  risk  factors  inherent  In  the  Individual  architectures  and  determining  what  Is  an  acceptable 
composite  level  of  overall  risk  to  the  developer.  For  example,  the  Avionic  System  Architecture  Consider¬ 
ations  Identified  In  Table  1  also  weigh  heavily  upon  the  decision  concerning  which  processing  architecture 
Is  "best"  for  a  specific  application.  The  necessity  for  having  to  take  Into  consideration  both  the 
processing  architecture  alternatives  as  well  as  other  Avion*-  System  Architecture  factors  provides  the 
developing  activity  with  a  myriad  number  of  possible  combinations  from  which  to  choose  during  the  design 
of  the  DCCS.  Tne  technical  management  task  required  to  separate  these  combinations  Into  a  set  of 
hierarchically  structured  options  based  upon  a  well  understood  set  of  selection  criteria  Is  complex  unto 
Itself. 

Because  of  the  large  number  of  Interrelated  factors  which  affect  the  selection  of  a  processing  configura¬ 
tion  for  a  specific  Avionic  System  Architecture  and  the  lack  of  a  historical  cost  data  base,  one  can  only 
address  In  general  terms  the  economic  considerations  of  the  various  processing  alternatives.  Even  though 
economic  considerations  can  only  be  addressed  In  general  terms  they  should  not  be  Interpreted  as  being 
either  superficial,  lacking  In  Importance,  nor  restricted  to  only  one  architectural  choice.  For  even  as 
Incomplete  as  Is  the  cost  data  at  this  point  In  time,  trends  can  be  drawn  from  experiences  with  the 
individual  requirements  of  current  alrcraft/avlonlc  systems.  Examples  of  considerations  which  have 
significant  Impact  upon  the  life-cycle  cost  of  DCCS  and  require  detail  management  attention  by  the 


.1  i 


developing  activity  during  the  project  planning  phase  are:  degrie  of  system  Integration,  degree  of 
partitioning  of  the  system,  software,  firmware,  and  hardware  trade-offs,  and  software  cost/comp1 exlty. 

2.4  DEGREE  OF  SYSTEM  INTEGRATION 

This  Issu^  addresses  the  degree  of  total  system  Integration  of  the  Avionic  System  Architecture.  For 
example,  should  the  categories  or  groups  of  subsystems  Identified  earlier  be  placed  on  a  single  high¬ 
speed  data  bus  or  should  each  group  have  Its  own  dedicated  data  bus  to  perform  functions  particular  to 
the  Individual  giotiplng  of  subsystems.  A  specific  example  of  the  dedicated  data  bus  would  be  to  keep  all 
vehirle-re'dted  subsystems  segregated  for  safety-of-fllght  reasons.  It  can  be  anticipated  that  if  there 
is  one  high-speed  data  bus  throughout  the  aircraft,  then  the  complexity  of  controlllrg  the  data  bus  and 
performing  real-time  executive  and  Interrupt  functions  would  be  Increased  dramatically.  In  turn,  software- 
related  costs  (design,  test  and  documentation)  would  increase  significantly.  If  not  proportionately  with 
the  degree  of  integration.  This  conclusion  Is  based  on  the  fact  that  cost  experience  (in  terms  of 
dollars  per  Instruction)  with  operationally  deployed  aircraft  systems  to  date  has  shown  that  the  real-time 
executive  and  I/O  routines  are  much  higher  than  application  programs  and  test  and  diagnostic  routines. 

2.5  DEGREE  OF  PART I ON  I NG  OF  THE  SYSTEM 


As  stated  earlier,  future  aircraft  DCCS's  must  be  designed  using  a  structured  process  of  decomposition 
Into  software,  firmware,  and  hardware  processing  modules.  In  future  aircraft,  the  degree  of  distribution 
(partitioning)  of  computing,  control,  and  conversion  functions,  will  be  dependent  on  the  availability  of 
Inexpensive  and  physically  diminutive  hardware  elements--namely  microprocessors  and  microcomputers.  It 
should  be  noted,  however,  that  while  the  use  of  a  central  computer  complex  to  provide  functional  digita1 
control  of  an  aircraft  has  deficiencies  due  to  the  multiplicity  of  tasks  which  must  be  performed  in  one 
machine,  the  DCCS  has  yet  to  face  the  same  problems  while  performing  similar  tasks  with  as  many  as  up  to 
150  to  200  (micros)  machines. 

2.6  SOFTWARE,  HARDWARE,  AND  FIRMWARE  TRADE-OFFS 

The  programmable  digital  computer  allows  In-service  functional  change  without  impacting  the  associated 
hardware,  except  where  additional  memory  Is  required.  With  the  recent  Introduction  of  firmware,  the 
"best  of  two  worlds"  is  available.  Furthermore,  the  options  for  committal  of  functions  to  firmware 
implementation  as  opposed  to  software  Is  unbounded  in  number.  Key  to  any  decision-making  process  as  to 
whether  or  not  to  put  a  function  Into  firmware  Is  when  should  one  freeze  the  software  program  design  and 
how  often,  If  ever,  is  the  program  going  to  be  required  to  be  changed  throughout  the  operational  lifetime 
of  the  system.  Any  misjudgeruent  on  the  proper  timing  for  freezing  the  program  into  firmware  and  miscalcu¬ 
lation  on  the  number  of  times  that  the  firmware  will  require  subsequent  change,  will  result  In  major 
increases  In  development  and  support  costs. 

2.7  SOFTWARE  COST/COMPLEXITY 

In  the  centralized  processing  architecture,  the  cost  and  complexity  of  Applications/Control  and  Input/ 
Output  programming  rises  exponentially  as  the  throughput  and  memory  of  the  centralized  computer  approaches 
its  maximum  (see  Fig.  2),  On  the  other  hand,  with  the  distributed  processing  architecture,  the  Cost/ 
Complexity  at  near  zero  percent  (OS)  distribution  is  the  same  as  one  hundred  percent  (100*)  utilization  of 
a  centralized  computer  system.  As  the  degree  of  distribution  ( 1 . e . ,  pnrf1t1oning)1s  Increased,  each 
application  software  module  becomes  more  Independent  and  has  less  effect  on  the  execution  of  the  total  on¬ 
board  system  processing  (program).  The  I/O  program,  however,  becomes  more  complex  since  more  processing 
elements  (micros)  must  be  Interfaced  via  the  data  bus  structure.  The  data  availability  and  I/O  control 
becomes  the  dominant  factor,  ultimately  following  the  I/O  program  curve  of  the  centralized  computer  system 
In  rising  Cost/Complexity  (see  Fig.  3).  The  sum  of  the  software  trends  Indicates  that  there  Is  probably  a 
point  at  which  partitioning  may  be  optimal.  As  is  ?<lf-ev1dent  from  Fig.  3,  at  either  end  of  the  percent¬ 
age  distribution  spectrum,  the  worst  of  both  situations  may  exist. 

2.8  STANDARDIZATION  AND  COWQNAlITY 

It  Is  the  author's  opinion  t! -  t  no  other  area  of  the  data  processing  field  is  more  complex  in  scope  and 
controversial  In  nature  than  the  area  of  standardization.  Many  professionals  In  the  field  of  oata 
processing  do  not  aqree  that  standardization  has  both  technical  and  cost  merit.  This  lack  of  consensus 
on  the  worth  of  standardization  Is  due  to  the  naturally  opposing  views  of  computer  system  users  and  the 
developers  of  computer  systems.  For  the  user  views  standardization  as  a  means  of  management  control  of 
development,  risks  and  system  life-cycle  cost  control,  while  the  developer  and  designer,  on  the  other  hand, 
views  standardization  requirements  as  an  unnecessary  restriction  on  technical  creativity.  Many  developers 
also  counter  the  user's  position  that  proliferation  of  computer  equipment  and  software  Is  a  major  life- 
cycle  cost  burden  with  the  claim  that  given  design  freedom  during  the  development  phase  of  a  new  system, 
they  would  Introduce  new  technologies  which  would  be  cost-effective  as  well  as  having  Increased  perform¬ 
ance  capability  over  existing  operational  systems.  Unfortunately,  there  is  a  tendency  amongst  proponents 
of  this  development  philosophy  not  to  mention  that  new  designs  also  give  rise  to  normal  self-vested 
interests,  such  as  Increased  profits  and  keeping  the  in-house 'design  teams  current  with  involvement  In 
emerging  technologies  and  techniques.  These  two  diametrically  opposed  positions  will  never  change  in 
this  author's  opinion,  as  the  developer  normally  will  only  address  the  technical  and  financial  aspects  of 
the  specific  systems  he  is  developing;  while  the  user,  on  the  other  hand.  Is  concerned  with  standardiza¬ 
tion  as  applied  to  multiple  system  applications.  Additionally,  there  Is  another  dimension  to  the 
standardization  Issue  which  often  is  not  considered  in  any  discussion  of  computer  systems  standards. 


13-4 


Specifically,  the  question  Is  at  what  point  or  level  does  one  standardize?  For  example,  one  could 
standardize  at  the  Instruction  Set  Architecture  (ISA)  level  while  allowing  the  designer  to  Incorporate 
the  latest  technologies,  change  the  physical  and  electrical  characteristics  (e.g.,  overall  dimensions, 
the  Internal  mechanical  structure  of  the  machine,  and  cooling  and  primary  power  requirements). 

Table  2,  "Standardization  Options",  lists  a  number  of  possible  standards  which  the  user  and/or  the 
developer  of  aircraft  avionic  equipment  could  adopt.  Several  or  many  of  these  options  could  be  combined 
to  form  an  all-encompassing  single  standard  depending  on  the  financial  resources  available,  malntaln- 
ablllty/support  approaches,  and  the  end  operational  use  of  the  system(s).  However,  the  more  these 
standardization  options  are  molded  Into  one  single  standard,  the  greater  will  be  the  negative  reaction 
of  the  developer,  as  stated  earlier. 

TABLE  2  STANDARDIZATION  OPTIONS 

Languages 

-  Preprocessor  (POL) 

-  Compiler  (hOL) 

-  Assembler  (MOL) 

Instruction  Set  Architecture  (ISA) 

-  Single  Instruction  Set 

-  Modular  Instruction  Set 

-  Extensible  Instruction  Set 

System-Level  Interconnection  Schemes 

-  Bus 

-  Loop 

-  Network 

-  Bus  Interface  Unit 

System-Level  Protocol 

-  User  Module  to  Operating  System 

-  Operating  System  to  Hardware 

Physical  Interface 

-  Pin  Compatible 

-  Plug  Compatible 

Physical  Implementation 

-  Black  Box 

-  Form,  Fit,  Function 

-  Standard  Module 

-  Micro-chip  Set 

Of  all  the  Standardization  Options  listed  In  lable  2,  adoption  of  an  Instruction  Set  Architecture  (ISA) 
as  a  standard  offers  the  greatest  economic  return  on  Investment  to  the  customer.  This  Is  assuming  that 
the  ISA  selected  as  a  standard  has  an  established  user  and  support  software  base. 

If  one  were  to  address  the  standardization  Issue  solely  on  the  basis  of  generalized  hardware  and  language 
(HOL)  alternatives,  then  a  matrix  of  comparative  risks  can  be  defined.  Figure  4,  Hardware  Standards, 
shows  the  technical,  av-hedule,  and  financial  risks  for  various  hardware  alternatives.  It  should  be  noted 
that  high  and  medlum/hlgh  risk  factors  have  been  assigned  to  the  Strict  Processor  and  Microprocessor 
Standards  because  of:  (1),  the  lack  of  experience  with  building  DCCS's  for  alrcraft/avlonic  systems 
applications;  and  (2),  It  Is  not  clear  at  tMs  point  that  a  single,  cost-effective  microprocessor  can  be 
established  as  a  standard  for  ajl_  appi  icatlons  throughout  an  Avionic  System  Architecture. 

The  key  issue  relative  to  establishing  a  microprocessor  as  a  standard  piece  of  hardware  Is  at  what  point 
does  one  not  enforce  standardization.  For  example,  is  every  application  which  calls  for  a  microprocessor 
whose  woriTTength  is  less  than  16  bits  subject  to  the  standard?  Or,  Is  there  a  minimum  memory  size  below 
which  the  microprocessor  would  be  excluded  from  standardization  considerations?  These  decisions,  while 
seemingly  Inconsequential,  do  have  a  significant  impact  on  the  design  of  the  system  and  development  costs. 

Many  Individuals  have  postulated  that  microprocessors  will  decrease  the  cost  of  computer  hardware  to  the 
point  at  which  It  Is  an  Insignificant  factor  on  future  developments  of  DCCS's.  This  claim  has  yet  to  be 
proven.  Unfortunately,  the  rising  costs  of  both  applications  and  support  software  have  lent  credibility 
to  the  position  that  the  cost  for  microprocessors  are  no  longer  of  relative  Importance  In  system  life- 
cycle  cost  considerations. 

Regardless  of  the  availability  of  comparatively  low-cost  microprocessors  and  microcomputers,  the  high 
cost  of  software  development  and  maintenance  has  given  considerable  support  to  the  utilization  of  HOL's 
and.  In  particular,  a  single  HOL  wherever  possible.  Figure  5  Indicates  that,  assembly  level  coding  Is 
deflnately  more  costly  than  that  of  using  HOL(s).  There  are  two  major  reasons  for  this  cost  differential: 
(1),  there  is  a  need  for  the  programmer  to  know  the  particular  Instruction  Set  Architecture  of  the  target 
machlne(s);  and  (L).  In  mist  cases  assembly  level  code  Is  used  mainly  lor  very  difficult  program  tasks 
such  as:  Input. /output,  operating  systems,  and  executive  control  of  real-time  systems.  In  each  of  these 
instances  the  programmer  must  work  with  "tight"  coding  requirements. 


13-5 


Within  the  context  of  this  paper,  commonality  Is  defined  as  the  utilization  of  equipment(s)  of  parts 
thereof.  In  multiple  operational  applications.  For  example,  many  aircraft  cockpit  controls  and  displays 
could  be  common  within  a  single  "family"  of  aircraft  types.  Each  aircraft,  however,  would  have  a 
specific  set  of  cockpit  controls  and  displays  tailored  to  Its  own  particular  operational  need.  Across 
all  aircraft  within  the  family,  the  controls  and  displays  would  perform  common  functions.  The  equipment 
itself  need  not  be  standard  Items  to  be  considered  within  the  context  of  conmonallty  as  the  term  is  used 
herein.  (See  Figure  6. ) 

The  potential  for  major  cost-savings  does  not  exist  with  the  utilization  of  conmon  equipment  as  It  does 
with  standard  equipment  because  of  the  specific  tailoring  or  uniqueness  of  the  equipment  to  each  appli¬ 
cation.  On  the  other  hand,  when  the  developer  applies  commonality  concepts  effectively,  there  Is  a 
great  potential  for  significant  cost-avoidance.  For  example,  specific  display  components,  bulk  memories, 
algorithms,  etc.,  can  be  applied  across  all  applications.  In  doing  so,  the  developer  avoids  those  costs 
associated  with  developing  totally  unique  equipment  designs  for  each  Installation. 

2.9  RELIABILITY,  MAINTAINABILITY.  AND  AVAILABILITY  (RMA) 

In  simplistic  terms,  aircraft/avionic  systems  are  designed  to  meet  pre-established  levels  of  reliability 
so  as  to  be  available  for  operational  use  for  given  time  periods  prior  to  a  failure  occurring  which  would 
require  a  maintenance  action  to  be  taken.  When  the  reliability  levels  are  not  achieved,  the  equipment 
is  not  available  and  additional  maintenance  actions  have  to  be  taken.  This  cause  and  effect  situation 
is  a  major  contribution  to  operational  support  costs.  In  the  author's  opinion,  it  Is  highly  unlikely 
that  with  the  current  degree  of  technical  sophistication  of  aircraft/avionic  equipment  that  these  costs 
will  decrease  in  the  near  future.  Furthermore,  unless  new  Avionic  System  Architectures  are  developed 
and  designed  as  described  earlier,  the  current  RMA  problems  will  remain. 

It  should  be  emphasized  that  using  a  PCCS  as  the  basis  for  a  future  Avionic  System  Architecture  will  not 
of  Itself  negate  the  current  RMA  problems,  however,  if  the  system  Is  designed  in  a  structured  manner,  it 
can  Include  many  features  which  would  assist  in  reducing  RMA  shortcomings  exhibited  by  current  opera¬ 
tional  systems.  Key  features  which  will  have  a  major  Impact  in  Improvement  of  MA  factors  and  a  corres¬ 
ponding  reduction  in  life-cycle  operational  costs  are:  fault-tolerant,  redundi.icy,  and  reconfigurability. 

The  capability  to  incorporate  fault- tolerant,  redundancy,  and  reconfigurability  techniques  and  concepts 
into  a  DCCS  Is  based  primarily  on  the  availability  of  relatively  inexpensive  microprocessors.  Given 
that  these  microprocessors  will  be  available,  the  major  question  remaining  is  at  what  level  does  the 
developer  insert  these  concepts  into  the  design  of  the  DCCS.  For  these  concepts  can  be  applied  either 
on  a  system-wide  basis,  or  at  any  of  the  subsystem  or  functional  grouping  levels.  Furthermore,  with 
the  coming  of  age  of  the  reconfigurable  memory,  one  can  now  have  increased  availability  at  the  component 
level . 

The  coupling  of  fault- tolerant,  redundancy,  and  reconfigurability  with  automated  fault-detection  and 
Isolation  also  offers  management  a  vehicle  for  minimizing  RMA  life-cycle  cost  for  future  DCCS's. 
Unfortunately,  the  expected  theoretical  Improvements  in  the  RMA  values  have  yet  to  be  fully  proven  out 
in  actual  practice  over  a  substantial  period  of  operational  time.  While  there  is  no  reason  to  believe 
that  the  potential  gains  cannot  be  achieved,  there  is  an  area  of  concern  (mentioned  earlier)  that  should 
be  addressed  during  the  development  of  the  Avionic  System  Architectura--name1y  that  of  the  actual  amount 
of  distribution  of  computing  resources  throughout  the  system  and  its  impact  upon  the  associated  software. 

The  complexity  of  the  software  associated  with  a  DCCS  is  going  to  be  a  major  challenge  by  itself.  There 
are  many  problems  yet  to  be  faced  with  an  aircraft/avionic  DCCS  which  may  contain  over  150  microprocess¬ 
ors  throughout  the  aircraft.  Additionally,  there  could  be  hidden  costs  because  of  unforeseen  needs  for 
performing  extensive  test  and  evaluation  of  such  a  system.  Hopefully,  sufficient  software  verification 
and  validation  techniques  will  be  available  to  insure  that  the  developer  can  adequately  separate  proving 
the  quality  of  the  software  from  the  quality  of  the  DCCS  to  function  adequately  as  an  integrated  network 
of  computer  resources. 

3.  EXTERNAL  FACTORS  IMPACTING  DCCS  DEVELOPMENTS 

3.1  EXTERNAL  FACTORS 

The  ouestion  that  developers  of  an  aircraft/avionic  DCCS  must  ask  themselves  before  starting  out  on  a  new 
design  Is  what  degree  of  control  do  they  have  over  their  final  design.  Unfortunately,  the  dynamics  of 
the  microelectronics  industry  as  mirrored  by  the  microprocessor/microcomputer  marketplace  presently  defy 
the  providing  of  reasonably  precise  answers  to  the  question.  At  best,  one  can  only  hope  that  the  impact 
upon  DCCS  development  efforts  and  related  life-cycle  consideration  of  the  Avionic  System  Architecture 
are  minimized  through  the  recognition  of  external  factors  during  the  planning  phase  of  the  project.  The 
following  external  factors  are  identified  as  having  a  major  impact  upon  the  DCCS  design  and  development 
and  thus  should  be  addressed  during  the  planning  phase  of  the  project:  the  rate  of  technology  advance¬ 
ment,  technology  dependence/independence,  limited  production  runs  as  a  function  of  time  and  lack  of 
leverage  upon  the  market,  technology  transfer  and  insertion,  and  the  vertical  structure  of  certain 
corporations. 

3.2  TECHNOLOGY  ADVANCEMENT 

It  is  almost  inconceivable  that  the  technological  inventiveness  of  the  solid  state  electronics  industry 
is  such  that  new  products  become  obsolete  almost  immediately  after  introduction  into  the  marketplace. 
Breakthroughs  In  such  areas  as  materials,  manufacturing  processes,  computer  aided  design,  architectures 
and  packaging  are  made  almost  daily.  Furthermore,  It  is  highly  unlikely  that  in  the  near  future  there 
will  be  any  slow-down  In  new  performance  capabilities  being  Introduced  in  the  microprocessor/ 
microcomputer  marketplace.  If  anything,  there  will  be  a  continued  explosion  of  new  applications  as  the 
prices  of  those  machines  (micros)  decrease  as  a  function  of  time. 


13-6 


All  other  design  factors  being  equal,  advancements  In  the  solid  state  electronics  field  are  not 
necessarily  detrimental  to  the  aircraft  DCCS  developer.  Desired  system-level  capabilities  such  as 
redundancy,  reconfigurability,  and  fault-tolerance  can  now  be  built  Into  the  system  economically  and 
contribute  to  achieving  the  deseed  performance  goals  set  for  system  maintainability,  reliability,  and 
availability.  On  the  other  hand,  these  capabilities  cannot  be  loglstlcally  supported  over  the  life-cycle 
of  the  system  DCCS  without  taking  Into  account  the  other  external  factors  which  Impact  DCCS  developments. 

3.3  TECHNOLOGY  INDEPENDENCE 

In  similar  fashion  to  the  coamerclal  computer  Industry  expression  of  "plug-to-plug"  compatibility,  the 
phrase  "technology  Independence"  has  been  Introduced  Into  the  military- Industry  lexicon.  In  a  manner  of 
speaking.  It  can  be  considered  a  technology  level  equivalent  to  the  form,  fit  and  function  (3F)  physical 
Implementation  approach  addressed  earlier.  The  concept  Is  very  simple,  that  Is,  by  being  Independent  of 
technology  uniqueness  one  can  Insert  new  technologies  at  given  time  Intervals  during  the  life-cycle  of 
the  aircraft/avionic  DCCS.  Ths  economic  return  on  Investment  for  Incorporating  this  capability  Into  the 
Initial  system  design  Is  significant.  On  the  other  hand.  It  does  demand  that  there  be  some  level  of 
mechanical  packaging  standards  In  order  to  Introduce  the  new  devices  and/or  components  Into  the  existing 
equipment  with  minimum  Impact  upon  the  associated  logistic  considerations.  Assuming  that  a  standard 
mechanical  packaging  concept  can  be  established  for  both  the  In  being  and  the  potential  replacement 
technologies,  than  there  will  be  a  logistic  cost  avoidance  In  that  the  higher  level  electronic  assemblies 
do  not  change  with  the  Insertion  of  the  new  technology. 

With  regard  to  software,  however,  technology  Independence  takes  on  a  number  of  meanings,  all  of  which 
depend  on  the  point  of  vic-w  of  the  developer.  For  example,  applications  and  support  software  for  a 
given  programmable  digital  computer  could  be  run,  with  no  changes,  on  a  newer  technology  machine  provid¬ 
ing  the  Instruction  Set  Architecture  and  other  software  program  dependent  characteristics  are  taken  Into 
consideration  during  the  Initial  design  phase.  A  second  conceptual  approach  would  be  to  keep  the  High 
Order  Language  (HOL)  interface  Independent  of  the  operational  target  machine.  Lcstly,  a  third  approach 
would  be  that  of  using  a  pre-processor  In  the  software  development  chain.  Specifically,  with  this 
approach,  one  establishes  the  near-equivalent  of  a  hardware  plug-to-plug  compatabl 1 1 ty  by  using  a  pre¬ 
processor  as  a  software  program  translator.  In  this  Instance,  the  firmware  Is  used  to  provide  the 
software  compatabl! Ity  link. 

Regardless  of  the  type  of  hardware  technology  used,  the  concept  of  software  transportability  Implies 
unto  Itself,  technology  Independence.  However,  unlike  hardware  technology  Independence,  software 
transportability  of  Its  very  nature  explicitly  implies  reusability  of  software  ts  opposed  to  the  basic 
concept  of  plug-to-plug  compatibility;  namely  that  of  technology  Insertion  through  technology  invisi¬ 
bility  (Independence). 

It  can  be  generally  stated  that  software  transferability  offers  the  developer  a  basis  for  cost  savings. 

On  the  other  hand,  since  the  application/program  will  no  doubt  be  different  to  a  certain  degree  from 
functional  task  to  functional  task,  new  compilations  will  have  to  be  performed  In  order  to  Insert 
different  application  dependent  parameters  and  data.  Thus  It  Is  perhaps  more  correct  to  state  that  as 
a  minimum,  using  software  transportability  concepts  In  an  aircraft/avionic  system  design  there  will  be 
a  cost  avoidance  In  that  both  the  operational  and  support  programs  do  not  have  to  be  re-created  from  the 
Initial  design  stage. 

3.4  LIMITED  PRODUCTION  RUNS 

There  Is  not  a  better  method  to  Insure  price  stability  than  that  of  having  the  advantages  that  accrue 
from  large  scale  procurements  over  a  given  period  of  time.  In  essence,  this  Is  the  economy  of  scale 
factor  of  classical  economic  theory.  Unfortunately,  It  Is  a  fact  of  life  nat  at  best  there  will  be 
limited  quantities  of  aircraft/avionic  Digital  Computer  Control  Systems  p  cured  by  any  one  development/ 
procurement  activity.  For  example,  even  If  an  aircraft  manufacturing  firm  has  Incorporated  DCCS's 
(utilizing  microelectronic  chips)  into  several  different  aircraft  models,  the  quantities  of  either 
commercial  or  military  aircraft  coming  off  the  production  lines  are  miniscule  compared  with  the  quanti¬ 
ties  of  microelectronic  chips  currently  being  procured  by  both  the  automotive  and  toy  Industries  on  a 
per  year  basis. 

It  would  appear  that  there  are  two  management  alternatives  which  would  overcome  the  Inherent  economic 
shortcomings  of  limited  production  runs  for  military  applications  of  comnerclal  components.  The  first 
approach,  would  be  to  add  onto  existing  comerclal  production  runs  which  are  expected  to  produce 
microelectronic  chips  over  an  extended  period  of  time.  In  this  Instance,  Individual  procurement  of  chips 
for  the  DCCS  would  be  made  part  of  a  standard  product  line  which  the  solid  state  electronics  firms 
expect  to  market  to  multiple  users  for  Into  the  foreseeable  future. 

In  the  second  case,  the  aircraft/avionic  systems  manufacturer  would  "front-end"  the  development  costs 
associated  with  the  design  of  a  given  microelectronic  chip  and  only  use  the  solid  state  electronics 
firms  as  a  production  facility.  Thus,  the  sysvem  developer  order  parts  to  his  specifications  and  Is  not 
dependent  upon  the  microelectronic  circuit  manufacturers  for  any  Initial  non-recurring  Investment  in 
chip  design  and  development  costs. 

It  Is  essential  that  an  acceptable  manufacturing  alternative  be  established  prior  to  production  In  order 
to  maintain  the  availability  of  chips  throughout  the  lifetime  of  the  DCCS  or  until  the  chips  are 
replaced  by  a  new  technology  during  the  operational  phase  of  the  system  life-cycle.  It  Is  Imperative  to 
note  that  the  lack,  or  shortage,  of  logistic  spare  parts  destroys  any  logistic  planning  performed  during 
the  RAD  stage  of  the  DCCS  and  further  compounds  the  subsequent  operational  problems  which  range  from 
day-to-day  system  availability  to  long  term  maintainability  and  reliability. 


13-7 


3.5  LACK  OF  ECONOMICAL  LEVERAGE 


Since  World  War  II,  the  aerospace  Industry  has  Introduced  many  advancements  In  the  electronic  state-of- 
the-art  Into  the  operational  environment.  In  general,  the  Industiy  has  Introduced  new  technologies 
because  they  have  had  both  the  performance  need  as  well  as  the  economic  leverage  to  do  so.  Over  the  last 
decade,  this  preemptive  position  has  been  eroded  so  that  presently  the  aircraft/avionic  developers  have 
very  little  Impact  upon  the  technological  directions  of  the  solid  state  electronics  Industry  (based  upon 
a  percentage  of  scales).  Neglecting  such  considerations  as  global  macroeconomics,  the  changing  role  of 
the  multi-national  firms,  and  the  emergence  of  a  truly  International  capability  to  manufacture  solid  state 
electronic  devices,  no  single  factor  has  had  such  a  major  negative  Impact  upon  the  economic  leverage  of 
the  aircraft/avionic  firms  over  the  solid  state  electronics  marketplace  as  that  of  the  coming  of  age  of 
microelectronic  circuitry.  That  this  Is  so  Is  so  ironic  In  that  the  aerospace  firms  first  Introduced 
Integrated  circuits  Into  aircraft/avionic  application  In  the  early  1960's. 

Since  the  mid  1960's,  the  combined  sales  of  aircraft/avionic  systems  to  both  the  private  end  public 
(defense  and  space)  sectors  has  declined.  While  decreasing  sales  volume  of  aircraft  per  unit  of  time  has 
had  a  profound  negative  effect  upon  Industry  leverage.  It  has  really  been  the  quantum  jump  in  densities 
of  the  chips  (transistors  per  unit  of  area)  which  has  become  the  dominant  factor  In  changing  who.  In  the 
private  sector,  has  the  economic  leverage  over  the  solid-state  Industry.  That  this  Is  so  should  be  some¬ 
what  self-evident  In  that  the  higher  density  chip  development  made  obsolete  the  first  generation 
"Integrated  circuit".  It  was,  for  all  practical  purposes,  a  single  (physical)  low  cost  replacement  for 
hundreds  of  individually  packaged  Integrated  circuits.  Thus,  in  reducing  by  orders  of  magnitude  the 
number  of  chips  to  be  procured,  all  vestiges  of  economic  power  over  the  solid  state  marketplace  by  the 
aircraft/avionic  system  developers  disappeared. 

In  retrospect.  It  is  somewhat  ironic  that  in  the  early  1 960 ' s  it  was  the  aircraft/avionic  industry  that 
was  the  only  group  of  users  that  "carried"  the  then  infant  microelectronic  Industry  during  those  days  of 
high-risk  Integrated  circuit  venture  enterprise.  By  contrast,  today  a  common  3  to  5  chip  microcomputer 
design  serves  applications  In  the  aerospace  Industry,  automated  factories,  medicine,  as  well  as  the  home 
entertainment  market.  On  the  other  hand,  projecting  Into  the  future,  there  Is  the  possibility  that  there 
may  be  yet  another  "role  reversal"  concerning  leverage  of  the  market.  Specifically,  the  use  of  Very 
Large  Scale  Integrated  Circuits  (VLSIC)  In  aerospace  applications  may  very  well  prove  to  be  the  key 
factor  in  having  the  microelectronic  circuit  manufacturers  re-tooling  to  meet  once  again  the  unique  needs 
of  the  aerospace  industry.  Whether  this  situation  will  come  to  pass  has  yet  to  be  determined.  Until  that 
time,  however,  aircraft/avionic  system  developers  will  have  to  fit  their  needs  into  standard  product  lines 
if  they  do  not  wish  to  incur  large  non-recurring  costs  for  production  of  customized  chips. 

3-6  VERTICALLY  STRUCTURED  CORPORATIONS 

Throughout  the  private  sector  there  are  many  Instances  where  a  corporation  is  vertically  structured--that 
is  where  the  organization  is  made  up  of  companies  and/or  divisions  which  supply  the  raw  materials, 
engineering  (including  R&D) ,  manufacturing,  and  sales  and  distribution  functions.  In  essence,  the 
corporation  does  not  go  outside  of  itself  for  any  major  aspect  of  its  operations  and  for  all  practical 
purposes  is  its  own  supplier  of  goods  and  services.  The  "verticality"  of  the  organizational  structure 
is  derived  from  the  nature  of  the  manufacturing  process  whereby  a  unit  of  the  corporation  builds  upon 
the  output  of  another  part  of  the  organization.  The  management  and  cost  advantages  of  this  situation 
whereby  availability  of  materials,  scheduling,  and  committment  so  corporate  goals  are  all  self-contained 
and  controlled  needs  no  further  amplification. 

With  the  advent  of  the  transistor,  many  firms  added  a  solid  state  technology  division  (as  a  separate 
profit  and  loss  center)  to  the  corporate  organization.  Except  In  certain  instances,  the  majority  of  these 
solid  state  technology  plants  manufactured  parts  for  the  general  commercial  marketplace  with  no  objective 
cf  serving  Internal  corporate  needs  for  devices  such  as  transistors.  In  the  author's  opinion,  the 
subsequent  introduction  of  the  microelectronic  chip  initiated  the  push  for  many  aircraft/avionic  equipment 
manufacturers  to  also  take  corporate  action  to  change  to  a  vertical  organizational  structure.  For  the 
microelectronic  chip  took  away  many  design  prerogatives  from  the  developers  and  effectively  made  the 
solid  state  electronics  manufacturing  firm  design  competitor,  albeit  at  the  very  low  end  of  the  design 
process.  However,  as  the  techniques  for  manufacturing  microelectronic  chips  matured,  and  the  industry 
introduced  medium  and  large  scale  integrated  circuits,  the  impact  upon  the  classical  design  freedom  of  the 
aircraft/avionic  equipment  firms  became  fairly  significant  as  the  chips  began  to  contain  more  and  more  of 
the  individual  circuits  previously  developed  as  physically  separate  designs. 

To  counter  the  growing  impact  of  the  external  factors  addressed  earlier  and  the  inroads  that  advanced 
microelectronic  circuitry  was  making  upon  their  traditional  development  efforts  and  organizational  makeup, 
many  aerospace  firms  changed  their  corporate  structures  to  a  vertically-oriented  one.  What  many  of  these 
corporations  did  within  the  past  decade  was  to  create  an  In-house  solid  state  electronics  and  technology 
organization  with  the  prime  customer  being  the  corporation  Itself.  The  capabilities  of  these  in-house 
facilities  are,  as  could  be  expected,  as  sophisticated  and  advanced  as  many  of  those  In  California's  so- 
called  "silicon  valley". 

It  is  premature  to  state  that  the  vertically-oriented  aerospace  firm  will  provide  a  management  approach  to 
overcoming  the  negative  aspects  of  external  factors  such  as  technology  advancement  and  Independence, 
limited  production  runs,  and  the  general  lack  of  economic  leverage  over  the  Industry.  An  exception,  of 
course,  is  the  case  where  the  aerospace  company  provides  chips  to  other  divisions  In  the  organization  In 
bulk  quantities. 

In  general  It  appears  that  the  creation  of  an  In-house  solid  state  manufacturing  facility  is  a  questionable 
long-term  cost-effective  solution  to  the  problem.  Specifically,  the  economic  law  of  supply  and  demand  will 
become  a  dominant  factor  relative  to  the  final  solution.  That  is.  If  the  number  of  firms  having  In-house 
solid-state  technology  RAO  and  manufacturing  facilities  Increases  unbatedly  with  time,  then  It  follows  that 
In  turn,  the  aerospace  firms  will  become  contributors  to  the  herein  defined  technology  and  manufacturing 
external  factors  over  which  they  currently  have  little  control.  It  Is  also  not  unrealistic  to  envision 


13-8 


that  with  time,  the  aerospace  firms  win  also  become  suppliers  of  microelectronic  circuitry  to  the 
marketplace  and  thus  eventually  become  competitors  with  today's  solid  state  electronics  firms.  To  use 
the  cliche,  the  solution  becomes  part  of  the  problem. 

4.  CONCLUSIONS 


There  are  several  conclusions  that  may  be  reached  relative  to  economic  considerations  for  future  Naval 
aircraft/avionic  Real-Time,  Distributed  Computer  Control  Systems.  The  primary  conclusion  Is  that 
designers/developers  will  have  very  little  economic  leverage  over  the  microelectronics  Industry  with  the 
current  low  rates  of  production  of  aircraft  and  related  avionic  systems.  What  follows  from  this  lack  of 
economic  control  is  questionable  future  enforcement  of  standardization  and  commonality  requirements. 

On  the  other  hand,  if  there  Is  an  economy  of  scale  due  to  a  large  quantity  buy  over  an  extended  period 
of  time,  then  there  will  accrue  to  the  customer  the  expected  savings  In  development  and  support  costs. 
However,  with  the  rapidity  of  technological  change  In  the  solid-state  electronics  Industry,  It  Is 
becoming  more  and  more  self-evident  that  to  fully  obtain  the  economic  benefits  of  standardization  and 
commonality,  technology  Independence  over  the  life-time  of  the  aircraft/avionic  system  must  Le  maintained. 


ARCHITECTURE 

ALTERNATIVES 

RISK 

TECHNICAL 

8CHEDULE 

FINANCIAL 

ID  DEDICATED  SUBSYSTEM 
PROCESSORS 

LOW/MEDIUM 

LOW/MEDIUM 

LOW/MEDIUM 

12)  REDUNDANT  DEDICATED 
SUBSYSTEM  PROCESSORS 
WITH  LOCAL  BUSES 

MEDIUM 

(WEIGHT) 

LOW 

LOW/MEDIUM 

01  REDUNDANT  DEDICATED 
SUBSYSTEM  PROCESSORS 

MEOIUM/HIQH 

(WEIGHT) 

LOW 

LOW 

(4)  REGIONAL  GROUPS  OF 
SUBSYSTEMS 

MEDIUM 

LOW/MEDIUM 

LOW/MEDIUM 

IB)  CENTRAL  PROCESSORS 

MINIMUM 

MEDIUM 

(INTERFACING) 

HIGH 

(INTERFACING 

G  SUPPORT) 

01  MULTIPROCESSORS 

HIGH 

(SOFTWARE  G 
BUS  PROBLEMS) 

MEDIUM/HIGH 

MEDIUM/HIGH 

FIGURE  1 

ARCHITECTURE  ALTERNATIVE  COMPARISON 


COST/ 

COMPLEXITY 


0  S  UTILIZATION  OF  CENTRALIZED  RESOURCE 


FIGURE  Z 

CENTRALIZED  PROCESSING  ARCHITECTURE 
SOFTWARE  COST/COMPLEXITY 


COST/COMFLEXITY 


•  THE  fcUM  OF  THE  TWO  SOFTWARE  TRENDS  INDICATES  A  FONT  OF  DISTRIBUTION 
WHICH  MAY  K  OFTIMUM.  FURTHER,  AT  EITHER  END  OF  THE  DISTRIBUTION 
SPECTRUM  THE  WORST  OF  BOTH  WORLDS  MAY  EXIST! 


FIGURE  3 

DISTRIBUTED  SYSTEM  TRADEOFFS 


13-10 


FIGURE  4 

HARDWARE  STANDARDS 


FIGURE  5 

LANGUAGE  STANDARDS 


FIGURE  6 

STANDARDIZATION  *  ARCHITECTURE 
INTERACTION  MATRIX 


14-1 


FUNCTIONAL  DOCUMENTATION  -  A  PRACTICAL  AID  TO  THE  ORDERLY 
SOLUTION  OF  THE  SYSTEM  DB3I0N  PROBLEM 


J.T.  MARTIN 

FERRANTI  COMPUTER  SYSTEMS  LIMITED, 
Western  Roed,  Braoknell,  Berkshire,  Englend. 


SUMMARY 


This  paper  deioribes  a  method  of  breaking  down  a  Customer  Requirement  in  an  orderly  manner  so  aa  to  produoe 
progressively  more  detailed  design  levels  suoh  that  at  any  one  stage  of  the  System  Design  the  partloular 
part  of  the  design  under  consideration  can  firstly  be  easily  understood  and  secondly  comparatively  isolated 
from  the  other  parts  of  the  design. 

The  most  Important  characteristic  of  the  design  methodology  is  that  the  Requirement  is  oonaidered  in  purely 
Functional  terms  until  a  highly  detailed  level  of  the  design  is  reaohed.  An  example  of  this  design 
methodology  and  the  technique  of  Functional  Documentation  is  given  and  the  paper  oonoludes  by  dlsousslng 
the  advantages  that  oan  aoorue  from  a  sensible  use  of  the  design  methodology. 

1.  INTRODUCTION 

To  produoe  a  suooessful  design  the  system  designer  must  start  his  design  from  the  viewpoint  of  what  the 
customer  requires  and  work  down  to  what  sub-systems  are  required  to  aohleve  this  requirement  -  the  Top  Down 
approach. 

In  order  to  oarrv  out  this  Top  Down  design  in  a  logical,  structured,  way  it  is  Important  thatt 

(a)  the  overall  problem  is  decomposed  in  a  controlled  fashion 

(b)  that  eaoh  layer  af  the  design  is  considered  in  the  oorreot  level  of  detail  suoh  that  on  the  one 
hand  sufficient  information  is  conslderd  before  moving  to  the  next  lower  level  of  design,  while 
on  the  other  hand  the  particular  level  of  design  reaohed  is  not  unduly  oluttered  by 
consideration  of  too  muoh  detail. 

The  method  used  by  Ferranti  Computer  Systems  Limited  to  achieve  these  ends  is  Furotlonal  System  Design 
utilizing  as  a  tool  in  this  process  the  powerful  Functional  Documentation  (FD)  technique. 

The  basic  concepts  behind  design  phase  documentation  were  developed  by  the  Systeme  Effectiveness  Laboratory 
of  Technical  Operations  Incorporated,  Burlington,  Mass,  and  further  amplified  by  the  United  Kingdom  Royal 
Navy. 


Th'»  system  of  Functional  Documentation,  first  used  by  Ferranti  Computer  Systems  Limited  (FCSL)  as  a  tool  for 
the  design  of  extremely  complex  shlpborne  distributed  processing  systems,  Is  now  being  used  to  oarry  out  the 
demanding  tank  of  System  Design  for  modem  airborne  distributed  computing  systems. 

The  main  fundamentals  of  FD  will  now  be  described  together  with  a  brief  example  of  the  use  of  the  technique. 
2.  THE  PURPOSE  OF  DESIGN  DOCUMENTATION 

A  fundamental  tenet  of  the  FCSL  approaoh  is  that  design  and  the  documentation  of  that  design  are 
inseparable.  The  production  of  design  documentation  is,  therefore,  an  integral  part  of  the  design  task  and 
the  design  dooumentation  is  itself  a  most  important  aid  to  the  design  process.  At  eaoh  stage  of 
development ,  existing  design  documentation  forms  the  basis  for  further  development. 

Until  system  development  and  implementation  begins,  the  design  dooumentation  is  the  only  tangible  evldenoe 
of  the  Intentions  of  the  design  team.  It  is  a  paper  model  of  the  system. 

Furthermore,  considering  the  project  as  a  whole,  design  documentation  must  meet  the  needs  of  all  subsequent 
stages  through  development,  production,  integration,  installation  and  trials  to  post-delivery  support.  Tha 
design  documentation  must,  therefore,  provide  not  only  a  description  of  the  proposed  system,  but  also 
information  necessary  for  the  preparation  of  test  requirements,  trials  specifications  and  servicing  and 
maintenance  data. 

The  suooessful  design  and  development  of  an  integrated  system  requires  that  eaoh  member  of  the  design  team 
be  aware  of  the  ourrent  design  intent  of  his  colleagues.  In  praotioe  this  implies  a  real  time  dooumentation 
system  with  a  language  conn  on  to  the  three  prinoipal  disciplines  involved,  namely  hardware,  software  and 
user. 

Functional  Dooumentation  (FD)  is  this  oonon  language  for  use  when  the  three  disciplines  must  work  in  co¬ 
operation.  It  is  a  design  tool  developed  speoifioally  to  assist  in  oo-ordlnatlng  the  design  an 2  through- 
life  development  of  real-time  systeme.  The  formal  dooumentation  system  of  FD  ohannels  the  paperwork  output 
of  system,  hardware,  software  and  user  engineering  staff  into  s  standard  format  whloh  is  circulated  amongst 
the  understood  and  agreed  by  the  design  team.  During  the  Projeot  Definition  phase  it  is  the  only  available 
evldenoe  of  progress. 


14-2 


The  completed  PD  foraa  an  agreed  document  whloh  defines  the  required  eye tee  functions  end  their  Interfaces, 
and  the  lnter-relatlonohipa  between  the  disciplines.  It  Is  the  specification  of  the  systee  whloh  shall  he 
Implemented  by  the  Individual  disciplines  during  the  Project  Developertnt  phase.  Hardware  PD  (W/D), 
Software  FD  (SFD)  and  User  PD  <UPD)  are  the  languages  employed  at  that  stage  when  the  individual  disciplines 
■ay  vnlidly  be  developed  Independently. 

The  prlaary  purpose  of  PD  Is,  therefore  to  sake  and  ooMunloate  the  definitive  statement  of  design.  In 
ful fulling  this  It  also  achieves  the  following  objectives: 

(a)  To  assist  in  the  correct  breakdown  of  the  design  into  separable  tasks  and  to  logically  define 
the  soope,  boundaries  and  Interfaces  of  oaoh  task  before  the  commencement  of  that  -ask .  As  a 
result  of  performing  each  task,  not  only  Is  technical  progress  aohleved,  but  more  detailed 
tasks  are  defined  whloh  also  fit  Into  the  overall  struoture. 

(b)  To  allow  technical  analysis  of  the  design.  The  documentation  effectively  provides  a  paper 
■odel  of  the  system  at  all  design  phases  and  is  constructed  so  as  to  highlight  areas  where 
design  nay  be  doubtful,  inconsistent,  ambiguous  or  Incomplete.  It  is  particularly  suitable  for 
examining  the  hardware/user/software  interfaces  and  for  predlolting  the  reliability  end 
maintainability  of  the  proposed  system.  It  further  allows  an  individual  designer  to  visualise 
the  Implications  of  design  changes  on  related  areas. 

(o)  To  enable  technical  management  and  the  ouatomer  to  monitor  the  progress  of  design,  both  to 
ensure  that  timesoales  and  workloads  are  satisfactory  and  that  teohnloal  requirements  are  being 
achieved.  It  Is  especially  amenable  to  the  use  of  PERT. 

(d)  To  provide  a  standard  ooomunioations  medium  between  the  members  of  design  teams  working  In 
different  disciplines  and  in  different  companies. 

(e)  To  provide  a  permanent  reoord  of  design,  as  It  prooeeda.  It  allows,  for  example,  new  staff 
Joining  the  project  to  appreciate  the  philosophy,  limitations  and  state  of  the  design  with  a 
minimus  of  effort,  and  equally  reduces  disruption  when  members  leave  the  project. 

(f)  To  provide  »  smooth  transition  of  teohnloal  data  Into  the  maintenance  handbook. 

(g)  To  form  a  basis  and  design  reoord  for  subsequent  Post  Design  (PDS)  activities. 

(h)  To  provide  an  entry  to  HPD,  UD  and  SPD. 

3.  FD  PRIMCIPU8S 

FD  la  a  technique  of  logioal  and  ordered  teohnloal  description  whloh  uses  graphloal  and  plotorlal 
presentation,  supported  by  the  written  work,  as  the  ooammun lost ions  medltm.  It  therefore  takes  advantage 
of  the  precision  Inherent  In  dlagramsatlo  presentation. 

To  provide  clarity  of  technical  description,  the  subject  matter  Is  sub-divided  In  two  dimensions,  whloh  are 
known  as  the  "level"  and  the  "function". 

(a)  Level .  As  the  design  phase  of  a  project  develops  Information  beoomes  available  In  increasing 
degrees  of  detail  and  complexity.  By  Its  nature,  early  phase  informtion  Is  more  general.  It  is  said  to  be 
"high  level"  Information  and  is  by  definition  the  first  to  be  documented.  Subsequent  information  may  be 
classified  ns  either  "intermediate"  or  "low"  level. 

These  levels  are  assigned  a  numerical  reference  as  follows t- 

Hlghest  level  *  1 

Intermediate  levels  t  2 

3 

etc. 

Lowest  level  :  n 

The  number  of  levels  actually  required  for  a  oomplete  description  will  depend  on  the  complexity  of  the 

subjeot,  and  the  amount  of  original  design/developmant  work. 

(b)  Function.  The  funotion,  in  the  general  oase,  Is  defined  as  that  grouping  of  hardware,  software 
and  user  necessary  for  the  achievement  of  a  required  evmtt  or  events.  This  implies  that  any  possible 
division  of  a  subject  into  Its  hardware,  user  or  software  boundaries  Is  of  secondary  oonaldoratlon. 
Functions  will  exist  at  all  levels  but  the  "required  event"  will  beoome  Increasingly  detailed  at  lower 
levels.  For  example,  "target  destruction*  say  be  a  valid  event  at  a  high  level  and  "status  bit  set”  may  be 
equally  valid  at  a  lower  level.  Therefore  eaoh  funotion  described  at  higher  levels  will  be  progressively 
sub-divided  and  amplified  at  lower  levels. 

3.1  Level/Punotlon  Relationship 

Each  funotion  identifies  a  niaber  of  sub-funotlons  which  are  then  "expanded",  l.e.  described  separately  In 
greater  detail,  as  functions  at  the  next  lower  level.  The  prooess  of  sub-dlvldlng  the  funotlons  Is  repeated 
until  a  sufficient  level  of  detail  Is  reaohsd.  This  Is  the  lowest  level,  whloh  will  identify  sub-funotlons 
performed  solely  by  hardware,  software  or  user. 

The  pyramid  of  funotlons  formed  by  the  progressive  sub-division  of  the  overall  function  Is  termed  an 
hierarchical  struoture.  Eaoh  funotion  at  eaoh  level  Is  at  onoet- 


(»)  A  statement  of  requirement  for  Its  sub-funotlons  st  tha  Isvsl  below,  and 

(b)  A  definition  of  the  implementation  of  tbs  requirement  stated  at  '.he  level  above. 

Baoh  piece  of  information  necessary  to  define  the  design  has  a  correct  place  In  this  logical  structure,  so 
missing  Information  is  highlighted,  end  there  should  be  no  duplication  between  levels  or  functions. 
Information  is,  therefore,  rapidly  aooessed  and  easily  retrieved. 

3.2  PD  Formats 

At  eaoh  level,  and  for  eeoh  function  at  that  level,  information  is  produoed  In  three  categories  which  are 
mutually  dependant.  These  arei- 

(»)  functional  Block  Diagram  (FBD).  Thu  function  as  defined  by  the  previous  level  is  expanded  Into 
Its  various  sub- functions,  and  the  inter-relationship  of  these  sub-funotlons  Is  defined  In  terms  of 
sequence  and  information  flow.  The  sub-funotlons  are  then  eaoh  subsequently  further  developed  on 
individual  FBD's  at  the  next  level. 

(b)  Functional  Text.  (FT).  The  PBD  is  supported  by  text,  including  a  concise  statement  of  the 
purpose  of  eaoh  sub- funotion,  whloh  nay  be  presented  in  the  corresponding  physical  looatlon  on  the  page 
facing  the  FBD,  as  Functional  Blocked  Text  (FBT) . 

<o)  Supplementary  Information  (SI).  This  category  oovers  all  other  information  not  readily 
asslnllated  into  the  first  two  categories.  It  includes  such  data  as  physical  layouts,  channel  allocations, 
manning  requirements,  design  theory  If  applicable  etc. 

It  Is  the  FBD  whloh  makes  the  definitive  statement  oonoerning  the  funotion  and  its  soope,  whilst  the  FT  or 
FBT  is  of  a  supporting  nature.  Any  additional  information  may  be  presented  us  si. 

3.2.1  *D  Dooument 

The  FT,  PBD,  FBT,  and  SI  (In  that  order)  for  eaoh  function  may  be  made  up  with  a  ooverlng  front  sheet  or 
oommnt  sheet  and  muster  page/dlstrlbutlon  list  as  an  FD  doouaent.  These  individual  FD  doowents  then  fit 
Into  the  hierarchical  structure  and  build  up  the  complete  Functional  Documentation  for  the  system. 

3.2.2  Functional  Reference 

All  functions  at  all  levels  (exoept  Level  1,  the  top  level)  are  assigned  a  functional  reference  of  the  form 
FI. 2. 3. 4...  The  number  of  numerical  digits  in  the  reference  defines  the  level  at  whloh  the  FD  doouaent  Tor 
that  funotion  will  appear. 

Thus  FI  appears  at  Level  2 
FI. 2  appears  at  Level  3 
PI. 2. 3  appears  at  Level  4 

etc. 

Since  every  funotion  at,  say  Level  2  Is  expanded  at  Level  3,  so  eaoh  functional  reference  Is  expanded  also, 
retaining  the  digits  of  the  original  functional  reference. 

If  a  given  funotion  requires  more  than  one  FBC  document  to  expand  it  meaningfully  at  the  next  level,  then 
the  subject  Is  divided  up  as  befits  the  alrcumstanoos .  The  functional  referenoes  for,  say,  three  FBDs  would 
appear  as  FI .2. 3/1,  FI. 2. 3/2,  FI. 2. 3/3  and  the  overall  reference  st  the  previous  level  would  appear  as 
FI. 2. 3/1. 3.  This  faoillty,  which  should  not  be  abused,  is  however  very  useful  for  describing  a  funotion 
whloh  has  for  example  more  tlxan  one  mode  (e.g.  normal  and  reversionary  modes)  or  more  than  one  phase  of 
operation. 

Clearly  It  Is  advantageous  if  documents  normally  Issued  separately,  e.g.  trials  schedules,  interface 
specifications,  requirements  specifications,  are  published  as  further  Supplementary  Information.  These 
documents  are  than  Indexed  by  the  functional  reference  and  are  accessible  from  the  hierarohioal  structure. 

4.  FD  EXAMPLE 

Annex  1  to  this  peper  presents  an  example  of  the  use  of  the  FD  technique. 

The  example  showa  how  part  of  a  system  (Function  33  of  the  overall  aystam  In  the  example  given)  is  first 
simply  described  «t  level  1  with  only  the  sain  attributes  of  the  funotion,  as  ssen  extsmally,  desorlbaj. 
The  overall  funotion  la  considered  to  be  divldable  into  three  (in  this  example)  sub-fUnotione  whloh  ere  then 
eeoh  oonaldered  In  more  detail  at  the  next  level  down  (Level  2).  This  proeess  of  functional  division  oan  be 
carried  out  until  it  Is  possible  to  define  the  process  that  eaoh  box  on  the  FD  is  to  perform  In  either 
hardware,  software  or  user  tame.  In  the  example  given  it  oan  be  seen  that  eeoh  box  on  the  level  2  diagrams 
can  be  used  to  produce  the  neoessary  specifications  to  enable  the  Implementation  of  the  particular  box  in 
quest  ion . 

5.  C0HCLU3I0H 


The  Functional  Documentation  system  allows  the  orderly  decomposition  of  an  overall  eystom  apeoifioatlon 
into  progressively  more  detailed  levels  of  design  information.  At  any  particular  level  of  the  design  the 
amount  of  Information  to  be  handled  Is  manageable  by,  and  understandabla  to,  tha  designer  tasked  with  the 
Job  of  furthering  the  design.  The  teahniqua  allows  people  from  the  different  disciplines  working  on  a 


14-4 


projeot  (umt,  hardware,  software)  to  ooonunioate  together  In  a  oowonly  understood  dssorlptlvs  forstat . 
Th*  tsohniqus  forces  •  oomplete  examination  of  the  system  spsolflostlon  end  exposes  any  defiolanoles  or 
ocdsaions  that  aay  exist.  Funotlonsl  Documentation  allows  the  oustoiser  full  Insight  Into  the  design,  shows 
hie  which  seas  of  the  original  specification  are  Insufficiently  precise  and  also  shows  hie  which  parts  of 
the  speofi nation  are  eost  difficult  to  accomplish  and  perhaps  candidates  for  re-exaelnatlon  In  teres  of 
oost/ooaplexity/requirement  tradeoffs . 

Using  Functional  Dooueentatlon  allows  a  oomplete  funotlonsl  description  of  the  design  to  be  produced,  and 
changed  If  necessary,  before  the  prototyping  stage  is  started.  Design  changes  can  be  nade  by  aieply 
altering  lines  on  paper,  not  by  expensive  re-design  of  hardware  or  software  nodules.  At  the  end  of  the 
Functional  Design  stage  the  information  necessary  for  the  production  of  the  hardware,  software  and  user 
specifications  Is  available,  consistent  and  achievable. 


ANNEX  1 

EXAMPLE  OF  THE  USE  Of  FUNCTIONAL  DOCUMENTATION 

1.1  Functional  Block  Diagram  Format 

1.2  Example  Functional  Documentation  Laval  1 

1.3  Example  Functional  Documentation  Level  2  Function  1 

1.4  Example  Functional  Documentation  Level  2  Function  2 

1.5  Example  Functional  Documentation  Level  2  Funotion  3 


14-6 


ANNEX  1.1 

FUNCTIONAL  BLOCK  DIAGRAM  FORMAT 


The  dlagrammatio  format  euployed  to  desoribe  tha  dasign  is  that  of  "Functional  Documentation"  aa  uaad 
within  Ferranti  Computer  Systems  Ltd.  Tha  following  Is  a  brief  summary  of  the  salient  points i 

(a)  The  diagram  Illustrate  primarily  the  flow  of  Information  (be  It  data  or  oontrol)  between 
functions.  Thicker  lines  are  used  to  emphasise  significant  information  paths.  The  diagrams  are 
essentially  time-sequential)  left  to  right. 

(b)  Funotlons  whioh  are  to  be  implemented  by  the  "system",  be  it  hardware  or  software  (or  aa  yet 
unknown)  are  Illustrated  by  the  symbol) 


(c)  Functions  whioh  are  to  he  Implemented  by  an  operator  action  are  Illustrated  ast 


The  devloe  used  In  the  operation  Is  identified  above  the  symbol,  e.g.  KB  a  Keyboard,  TB  *  Traoker  Ball. 

"'he  use  of  the  operator  fUnation  Implies  some  interfacing  hardware  and  software  to  get  the  information  Into 
the  system.  These  hardware  and  software  components  (Service  Funotlons)  are  omitted  If  they  do  not  add  to 
the  understanding  of  the  funotlon  In  hand. 

(d)  Display  of  system  Information  to  the  operator,  is  shown  by  the  following  symbol,  annotated  by 
the  devloe  type) 


vera 


Again  the  Service  Funotlons  involved  are  omitted  If  non-oritioal  to  understanding  of  tue  application. 

(e)  hardware  items  are  shown  within  dashed  boundaries. 

(f)  Where  other  Application  Funotlons  are  involved  In  the  operation  of  the  module,  these  are 
illustrated  by  the  symbol) 


The  operation  of  these  funotlons  would  be  detailed  elsewhere  in  the  documentation. 

(g)  There  is  frequently  a  requirement  to  illustrate  alternative  paths  for  information  flow.  The 
following  symbol  is  used : - 


b 


This  illustrates  either  souroe  A  or  souroe  B  being  routed  to  destination  C,  subject  to  the  oontrol  input  D. 


14-7 


ANNEX  1.2 

EXAMPLE  FUNCTIONAL  DOCUMENTATION  LEVEL  1 
MATCH  CONTROL 


1.  INTRODUCTION 

MATCH  (Medium-Range  Anti-Submarine  Torpedo-Carrying  Helicopter)  Is  a  weapon  system  whioh  utilises  a 
helioopter  to  oarry  and  launch  a  torpedo  In  an  anti-submarine  engagement. 

The  control  function  is  involved,  primarily,  in  the  calculation  of  the  helloopter'a  course  to  fly  and  time 
to  weapon  release  so  that  the  aircraft  controller  can  relay  oommands  to  the  pilot.  The  procedure  used  for 
course  and  launoh  calculations  is  i-nown  as  Vectored  Attaok  (VECTAC).  The  oaloulatlons  take  aooount  of  the 
torpedo  characteristics  when  deriving  the  aim  point,  -...c  target  position  may  be  any  track  held  by  the 
system,  or  may  be  a  fixed  datum  point  indicated  by  tracker  ball.  Ship's  radar  is  used  to  track  the 
helicopter  during  its  flight  so  that  course  corrections  can  be  applied. 

The  funotion  is  also  able  to  control  a  MAD  Verification  Run  (MADVEC).  In  this  application  the  helicopter  is 
used  to  carry  Magnetiu  Anomaly  Detection  (MAD)  equipment  to  a  suspected  submarine  position  so  that  the 
presenoe  can  be  verified  or  discounted  Ly  the  ohange  in  magnetic  field.  This  enables  sonar  contaot  whioh 
may  in  faat  be  shoals  of  fish,  for  example,  to  be  eliminated.  In  a  MADVEC  the  helicopter  la  guided  to  pass 
directly  over  the  selected  position,  without  any  launoh  calculations. 

Although  the  name  MATCH  indicates  that  a  helicopter  is  used,  it  is  also  possible  to  employ  a  fixed  wing 
aircraft  without  any  differences  to  the  operation  of  the  function.  Also  it  is  possible  to  control  an 
aircraft  which  is  based  on  a  consort  rather  than  own  ship. 

Guidance  of  the  helicopter  is  achieved  by  voice  communication  of  the  appropriate  orders  betwet  i:he 
aircraft  controller  (ship's  operations  room)  and  the  hellcop'.sr  pilot.  The  pilot  is  responsible  for 
launching  the  weapon  by  h.la  own  weapon  controls. 

Control  of  two  MATCH  engagements  is  possible  at  any  one  time.  The  two  engagements  must  be  controlled  one  by 
a  North  display  operator  and  the  other  by  a  South  display  operator.  A  North  (South)  engagement  may  be  taken 
over  by  another  North  (South)  operator  to  allow  for  equipment  failure.  The  Functional  Specification  does 
not  describe  two  simultaneous  engagements,  as  this  simply  implies  two  independent  operations.  A  MATCH 
engagement  can  be  controlled  from  any  console  which  has  facilities  for  communications  with  the  aircraft. 

2.  MODE  OF  OPERATION 

The  funotion  is  sub-divided  into  three  sequential  phases  or  a  MATCH  engagement,  see  Block  Diagram. 

2.1  MATCH  Preparation 

This  function  is  concerned  with  the  insertion  of  oertain  parameters  neoessavy  for  the  calculation  of  a 
VECTAC  and  for  the  initiation  of  radar  tracking  on  the  helicopter.  The  actions  Involved  can  be  carried  out 
prior  to  and  in  anticipation  of  an  engagement. 

The  following  data  is  inserted! 

(a)  Helicopter  u.dicated  Air  Speed  (IAS). 

(b)  Torpedo  Iniv.al  Search  Depth  (ISD)  -  this  is  the  depth  from  whioh  the  torpedo  search  becomes 
effective. 

(c)  Torpedo  Ballistic  Correction  -  this  indicates  the  distance  the  torpedo  will  fly  between  release 
and  splash  point. 

(d)  Magnetic  Variation  -  for  helicopter  course  corrections. 

(e)  Hind  Data  also  for  helicopter  course  corrections. 

Tracking  of  the  helicopter  can  either  be  aarried  out  manually  or  by  an  auto-extractor.  The  normal  olose 
range  surveillance  radar,  or  the  helloopter'a  transponder  returns  to  the  RRA  equipment  may  be  used,  as 
appropriate  to  clutter  and  range  conditions . 

2.2  MATCH  Approach 

The  approach  function  controls  the  engagement  Trom  the  point  of  initiation  until  the  final  attack  phase. 
Throughout  this  phase  the  function  repeatedly  re-caloulates  the  course  to  be  steered  by  the  helioopter  so 
that  it  will  intercept  the  target  allowing  for  increment  of  the  target  and  drift  cf  the  helioopter.  It  is  a 
basio  assumption  of  the  calculations  that  the  helioopter  will  fly  the  adhered  course  from  its  current 
position  at  the  pre-determined  IAS  and  height,  and  that  the  target  will  maintain  its  last  estimated  heading 
and  speed.  The  veotor  calculations  performed  do  not  therefore  allow  for  turning  oiroles  of  the  alroraft  nor 
for  non-linear  prediction  of  the  target  position  and  velocity. 

The  target  may  either  he  a  track  in  the  system  or  a  fixed  datum  point  inserted  by  an  operator.  The  latter 
case  allows  for  suspeoted  target  positions  whioh  are  not  being  tracked,  or  for  fleeting  sonar  oontaots,  eto. 


14-8 


The  VKCTAC  oaloulatlon  Is  performed  every  Uji  seconds  during  the  approach  phase  up  to  13  seconds  before 
weapon  release  where  a  countdown  phase  Is  entered  'see  Metoh  Attack).  It  Is  essential  for  the  operator  to 
guide  the  aircraft  on  to  a  stabilised  path  during  the  approach  phase,  or  else  engagement  will  have  to  be 
aborted  and  a  new  Veotan  initiated. 

The  Approaoh  Punotlon  oaloulates  the  following  data  for  display  to  the  operator  and  transmission  to  the 
pllott 

(a)  Course  to  Steer  (CTS)  Magnetio  or  True. 

vb)  Dlstanoe  to  Fly  to  Weapon  Release  Point  (DTQ). 

(o)  Time  to  go  to  Weapon  Release  Point  (TTO). 

It  also  controls  the  synthetio  display  of  a  Drop  Point  on  the  Controller's  radar  display. 

2.3  MATCH  Attaok 

The  Attsok  Funotion  oontrols  the  final  approaoh  of  the  helicopter  on  its  ourrent  fixed  oourse.  During  the 
attack,  a  countdown  is  relayed  to  the  pilot  so  that  he  knows  exactly  when  to  release  the  weapon.  The  same 
procedure  applies  in  a  MADVEC  sinoe  it  is  necessary  for  the  pilot  to  know  exactly  when  he  la  over  the  target 
area  in  order  to  record  the  MAD  detection  (or  to  f  rk  the  output  of  a  pen  reoorder) . 

Once  the  weapon  has  been  released  the  facility  calculates  where  the  torpedo  will  hit  the  water,  and 
Initiates  the  display  of  a  splash  point  on  the  radar  display,  together  with  a  surrounding  weapon  danger  area 
(doglbox) . 


14-10 


ANNEX  1.3 

EXAMPLE  FUNCTIONAL  DOCUMENTATION  LEVEL  2  FUNCTION  1 
MATCH  PREPARATION 


1.  INTRODUCTION 

This  funoticn  is  oonoemed  with  the  Insertion  of  parameters  necessary  for  the  calculation  of  a  VKCTAC  and 
for  the  Initiation  of  radar  tracking  on  the  helicopter.  The  aotions  involved  oan  be  carried  out  prior  to 
and  in  anticipation  of  an  engagement. 

2.  MODE  OF  OPERATION  (References  refer  to  FBD) 

Initiation  of  a  MATCH  engagement  will  generally  come  from  the  Anti-Submarine  Warfare  Direotor.  On  his 
oomand  the  variable  parameters  are  manually  Injected  into  the  system,  led  by  the  lnjeotion  to  oonvert  (6) 

Relative  Wind  to  True  Wind.  The  injections  aan  be  oheoked  on  the  Cheok  Line  readout  of  the  tote,  or  at  any  (2) 

stage  by  query  injections. 

According  to  radar  considerations ,  the  operator  will  also  select  auto  or  manual  tracking  of  the  helioopter,  (8) 
thus  implementing  the  Radar  Manual  Tracking  or  Radar  Autotraoklng  Functions.  (11-13) 

When  all  the  variable  parameters  have  been  Inserted,  and  the  helioopter  track  has  been  initiated,  the  (14) 
operator  oan  proceed  to  the  MATCH  Approach  phase. 

3.  DESCRIFTRIOM 

3.1  Conversion  of  Relative  Wind  to  True  Wind  (1—1) 

Relative  Wind  is  displayed  or.  a  VCS  unit  which  indioates  direction  and  speed  (relative  to  ship's  notion). 

The  VECTAC  requires  true  wind  and  henae  the  following  lnjeotion  is  used  to  input  relative  and  obtain  true 
winds 

RW?  W105P  S18 

"Display  it.  the  readout  the  True  Wind  dlreotion  and  speed,  where  the  relative  direction  and  speed  are 
as  indicated  (e.g.  Red  105°,  18  knot)". 

After  the  conversion,  the  readout  is  presented  ass 

W215  (direotlon  to  ♦  1°) 

S25  (speed  to  +  1  knot) 

3.2  Insertion  of  True  Wind  (6) 

The  true  wind  velocity,  required  for  calculation  of  helicopter's  relative  velocity,  is  inserted  by  the 
following  injection,  using  data  from  the  relative  to  true  conversions 

HC  +  W215  S25 

True  wind  value  is  as  indicated  (e.g.  direotlon  215°,  speed  25  knots). 

3.3  Insertion  of  Ballistic  Correction  and  Indloated  Air  Speed  (6) 

Ballistic  Corrections  are  available  to  operators  for  each  aircraft  type  which  may  be  used  in  a  Veotao.  The 
correction  is  a  distance  value  (horizontal  displacement  Weapon  Release  Point  to  Splash  Point)  which  in 
applicable  to  the  aircraft's  Veotao  engagement  speed  (IAS).  The  value  applies  to  a  preordained  altitude  at 
whlah  the  airoraft  will  fly. 

The  MI  Control  function  requires  these  to  be  inserted  as  a  parameter  oouplet  in  the  forms - 
HC  ♦  R16  S90 

Balllstlo  Correction  and  Indloated  Air  Speed 
Values  are  as  indloated  (e.g.  160  yards,  90  knots). 

For  a  MADVEC,  the  Balllstlo  Correction  is  Indicated  as  zero. 

3.4  Insertion  of  Magnetic  Variation  (6) 

The  MATCH  Approaoh  Function  automatically  converts  helicopter  bearings  to  magnetic  for  relay  to  the  pilot. 

The  variation  Is  injeoted  ass 

HC  ♦  V- 7 

"Magnetic  Variation  is  as  indloated 
(e.g.  7°  East)" 


14-12 


Note:  If  a  True  heading  la  required  (as  will  be  neoessary  for  some  types  of  helicopter)  a  variation 

of  +  or  -  should  be  lnjeoted. 

3.5  Insertion  of  Initial  Searoh  Depth 

Initial  Searoh  Depth  (ISD)  is  required  so  that  Vectae  oan  allow  for  the  time  taken  for  the  torpelo  to  reaoh 
an  effective  searoh  position,  when  oaloulatlng  the  desired  splash  point.  The  value  la  Injected  as: 

HC  +  U25 


"ISD  is  as  indicated  (e.g.  250ft)". 

For  a  MADVEC,  the  ISD  value  is  lndloated  as  zero. 

3.6  Checking  Vectao  Parameters 

Manual  injections  are  also  available  to  enable  one  of  two  groups  of  sto.ed  Vectao  Parameters  to  be 
interrogated  at  any  time: 

HC?  HSV 

"Display  In  the  readout  stored  values  of  True  Wind 
Direction  and  Speed  and  the  Magnetic 
Variation" . 

Readout  (e.g.  :  W215 

SO  35 
V-07 

HC?  RSU 

"Display  in  the  readout  stored  values  of  Ballistic 
Correction,  LAS  and  ISD". 

Readout  (e.g.  :  P.016 

S090 
U250 


3.7  Tracking  the  Helicopter 

The  operator  selects  the  mode  of  traoklng  and  the  type  of  equipment  to  use  according  to  oon>  ltlons  and 
whether  the  helicopter  is  on  board  or  airborne.  Reference  should  be  made  to  Pioture  Compilation  for  detail 
on  the  tracking  functions.  If  the  helicopter  1s  on  deck  and  cannot  be  tracked  normally,  a  manual  track  is 
set  up  alongside  the  ship  for  correlation  with  the  helicopter  when  it  is  airborne.  Alternatively,  traoklng 
may  be  initiated  using  the  RRA  equipment  and  the  helicopter's  transponder.  The  Vectac  calculations  can 
start  as  soon  as  initiation  is  made.  It  is  possible  to  ohange  the  tracking  mode  from  Manual  to  Primary  to 
Secondary  radar  during  any  phase  of  a  MATCH  engagement  without  afeotlng  the  VECTAC. 


(6) 


(6) 


(5) 


(5) 


(6, 

11- 

IB) 


14-13 


ANNEX  1.4 

EXAMPLE  FUNCTIONAL  DOCUMENTATION  LEVEL  2  FUNCTION  2 
MATCH  APPROACH 


1.  INTRODUCTION 

The  MATCH  Approach  Function  calculates  the  VECTAC  solution  at  regular  intervals  between  initiation  and  the 
final  attack  phase.  As  a  result  of  the  calculations,  the  helicopter  is  guided  to  the  target  area. 

2.  MODE  OF  OPERATION  (References  refer  to  FBD) 


The  VECTAC  calculation  is  commenced  by  the  initiation  conmand  which  may  either  specify  a  track  number  -  a 
datum  point  for  the  target.  (10) 

The  calculation  is  carried  out  once  every  4i  seconds  until  time  to  go  (TTG)  is  less  than  13  seconds.  ( 1 1 » 

1-4) 

As  a  result  of  the  calculation,  the  aircraft  controller's  tote  and  labelled  radar  display  are  updated  with  (5- 
engagement  symbology  and  guidance  commands  for  the  helicopter  pilot.  9) 


The  VECTAC  can  be  cancelled  prematurely  if  required. 

3.  DESCRIPTION 

3.1  Selection  of  Target  Type  (10) 

If  the  target  is  being  tracked  by  the  system,  the  VECTAC  will  be  initiated  using  the  target's  and 
helicopter's  track  numbers.  Otherwise  the  target  position  is  marked  by  tracker  ball. 

3.2  VECTAC  Initiation  for  Tracked  Target  (10) 

Initiation  of  the  VECTAC  is  made  by  the  following  Manual  Injections 

HC?  4731  00465 

"Calculate  VECTAC  for  target  and  helicopter  tracks 
Indicated,  and  display  the  solution  in  the  readout 
(e.g.  target  track  4731,  helicopter  track  0465)". 

The  target  track  may  alternatiely  be  indicated  by  placing  the  tracker  ball  over  the  track  and  injecting: 

HC7  |TB|  00465 

A  North  side  VECTAC  can  only  be  initiated  if  a  North  side  console  is  not  already  progressing  a  VECTAC 
(similarly  for  South  side). 

3.3  VECTAC  Initiation  for  a  Fixed  Point  Target  (12) 

The  following  manual  injection  is  used  to  initiate  a  VEtTAC  on  a  fixed  datum  point: 

HC?  P|TB|  00465 

"Calculate  VECTAC  for  the  fixed  position  using  the  indicated 
helicopter  (e.g.  046‘j)  and  display  the  solution  in  readout". 

Similar  restrictions  to  the  initiation  apply  for  this  operation,  as  are  explained  in  Paragraph  3.2. 

3.4  Cycling  the  VECTAC  Calculation  (11) 

Every  VECTAC  Calculation  provides  a  solution  with  which  the  helioopter  can  be  controlled.  In  order  to 
allow  for  changes  in  course  of  the  target,  and  deviations  from  defined  oourse  by  the  helioopter,  the 
calculation  is  repeated  every  41  seconds  during  the  approach  phase. 

3.5  The  VECTAC  Calculation  (2_1|) 

Refer  to  Figure  . 

The  aim  of  the  VECTAC  oaloulation  is  to  make  the  helioopter  reach  the  Splash  Point  (SP)  at  the  same  time  as 
the  submarine  reaches  point  A.  The  distance  x  (Weapon  Effective  Correction)  is  such  that  the  torpedo  will 
fall  to  Initial  Search  Depth  in  the  same  time  as  the  submarine  takes  to  reach  the  Splash  Point  (it  is 
assumed  that  the  torpedo  travels  vertically  downwards  once  it  is  in  the  water). 

In  order  for  the  torpedo  to  enter  the  water  at  the  splash  point,  it  is  necessary  for  the  weapon  to  be 
released  at  the  Drop  Point  (DP).  The  distance  y,  known  as  the  Ballistic  Correction  is  supplied  as  a  VECTAC 
Parameter  and  is  dependent  on  the  airoraft  and  weapon  type.  Ballistic  Correction  is  not  oorreoted  for  wind 
during  the  oaloulation. 

Onoe  the  dlstanoe  x  is  known  and  the  current  position  of  the  submarine  and  helicopter  have  been  derived,  the 
veotor  veloolty  solution  oan  be  calculated.  Other  faotors  available  or  derived  for  the  solution  are: 


14-14 

Helioopter  IAS 
True  Wind 

Target  True  Course  and  Speed 

The  result  of  the  vector  calculation  is  the  iixlloated  aourse  of  the  helloopter.  There  are  generally  two 
solutions  to  this  calculation  (a  quadratic),  the  function  adopts  the  solution  which  provides  the  maximum 
dosing  relative  velooity. 


(5,6) 


3.10  Controller's  Tots  Readout 

During  a  MATCH  Approaoh  a  readout  of  the  following  fora  la  produced i 

0465  (Helioopter  Traok  Number) 

C105  (Helioopter  Course,  e.g.  105°) 

0075  (Dlstanoe  to  Ply,  e.g.  7.5  dais) 

5M00  (Tlae  to  Weapon  Release,  e.g.  5  ain.  0  sto.) 

3.11  Controller's  LRD  Display  (7,8) 

The  Helioopter  and  target  tracks  will  be  on  display,  as  supplied  by  the  Traok  Formation  function. 

The  MATCH  Approaoh  Function  supplements  the  display  with  a  synthetic  marker  for  the  Drop  Point,  lc  t.ne  fora 
of  an  asterisk. 

3.12  Guidance  of  the  Helioopter  (9) 

The  controller  relays  the  data  on  his  readout  to  the  pilot  to  enable  him  to  steer  the  helioopter. 

3.13  Deteotion  of  Completion  of  MATCH  Approaoh  (4) 

When  TTO  reaches  13  seconds  the  Approaoh  phase  is  ooaplete,  so  the  calculation  oyole  is  terminated  and 
oontrol  passes  to  the  Attaok  Function. 

3.14  Premature  Cancellation 

The  VECTAC  oan  be  cancelled  prematurely  by  use  of  the  standard,  "oanoel  readout"  injection,  e.g.  "DR-". 

This  effectively  dears  the  inhibit  on  further  VRCTACS  described  in  para.  3.2. 


14-17 


f 

i 


s 


AMEX  1.5 

EXAMPLE  FUNCTIONAL  DOCUMENTATION  LEVEL  2  FUNCTION  3 
MATCH  ATTACK 


1.  INTh.DUCTION 

The  MATCH  Attaok  Function  controls  tha  final  phasa  of  a  MATCH  engagement  whan  tha  hallooptar  Is  on  a  steady 
course  and  counting  down  to  Weapon  Falaasa  Tlae.  It  also  provldas  Information  to  tha  controller  whan  tha 
tor  pad  o  has  baan  dropped,  for  tactical  evaluation. 

2.  MOD,;  Of  OPERATION 

Rafar  to  'he  Funo  Ion  Block  Diagram . 

Tha  Attaok  phasa  Is  Initiated  by  thu  signal  Comeenoe  Attaok  which  Is  received  13  saoonds  before  Weapon  (1) 
Release.  This  signal  starts  the  countdown  In  seconds. 

The  countdown  display  on  tha  tote  la  relayed  to  the  pilot  by  tha  oontrollar.  When  tha  countdown  reaohas  (2 
zero,  the  pilot  launohes  tha  waapon.  -4) 

Onoe  the  weapon  has  baan  released  the  Drop  Point  display  Is  removed  and  a  Splash  Point  and  Weapon  Danger  (5 
Zone  (Dogbox)  are  painted  Instead.  This  enables  the  controller  to  assess  whether  the  engagement  was  -9) 
accurate,  with  referenoe  to  the  target  track. 

3.  DESCRIPTION 

3.1  Countdown  Control 

The  MATCH  Attack  Function  Is  controlled  by  a  one  second  countdown  which  initiates  a  tota  update  until  TTO  (1) 
equals  zero. 

3.2  Countdown  Readout  (2,3) 

During  the  attack  the  tote  readout  only  shows  the  countdown  value 1- 

0007  (1st  line,  e.g.  7  seoonds) 

This  value  is  relayed  to  the  pilot  by  the  aircraft  controller.  (4) 

3.3  Aotlan  at  Waapon  Release  Tlae 

When  TTG  a  0  the  display  of  the  Drop  Point  is  removed  and  instead  a  aplash  point  and  dogbox  are  painted.  The  (5. 
Helicopter  pilot  operates  the  launoh  controls  for  the  torpedo.  The  inhibit  on  further  VECTACS  Is  removed  at  10) 
this  time. 

3.4  Splash  Point  Display 

The  splash  point  co-ordinates  are  calculated  and  a  request  is  made  for  the  Ploture  Control  Function  to  (7) 
generate  a  Splash  Point  and  Dogbox.  These  two  markers  are  deleted  automatically  after  71  minutes  by  Dioture  (8- 
Control.  Alternatively  they  may  be  oleared  on  demand  by  the  appropriate  Ploture  Control  Injection.  10) 

3.5  Terminating  the  Engagement 

The  aircraft  controller  evaluates  the  engagement  according  to  his  display  data  and  the  helicopter  pilot’s  (11) 
report.  A  new  engagement  say  be  initiated  by  the  ASWD  If  desired. 

If  no  new  engagement  is  required  the  operator  should  clear  the  0000  readout  by  Injecting  "DR-". 


1 


83- 


DISCUSSIONS 
SESSION  III 

REFERENCE  NO.  OF  PAPER:  II 1-11 
OISCUSSOR'S  NAME:  J.  T.  Martin,  Ferranti 
AUTHOR'S  NAME:  A.  A.  Callaway 

COMMENT:  I  notice  that  the  program  uses  a  constant  overhead  to  allow  for  command  and  status  words  and 
the  response  times  of  the  terminals.  Have  you  considered  allowing  the  response  time  to  be  a  variable 
rather  than  a  constant,  the  variable  being  specified  on  a  per  RT  basis.  If  required?  This  would  allow 
known  response  times  of  terminals  to  be  inserted.  Or  the  same  basis,  are  ycu  Intending  to  extend  the 
program  so  as  to  cope  with  acyclic  messages?  In  such  a  case,  I  would  recommend  that  the  program  accept 
as  a  parameter  the  amount  of  time  allocated  to  all  acyclic  messages  and  produce  as  an  output  the 
average  and  maximum  waiting  time  before  the  acyclic  message  Is  handled. 

AUTHOR'S  REPLY:  I  feel  that  the  average  value  of  overhead  Is  sufficient  for  the  purpose,  especially 
since  the  control ler/termlnal  and  terminal /terminal  overheads  are  resettable  parameters.  The 
Inter-word  and  Inter-message  gap  figures  preset  Into  the  program  are  6  ps  -  thought  to  be  a  reasonable 
average  value  between  4  y?  and  10  ps  which  MIL- STD-15538  specifies.  If  the  user  observes  that  his 
terminals  Involve  a  spread  of  response  times,  then  a  representatl ve  average  value  will  still  suffice  In 
the  analysis  -  If  the  system  Is  critically  affected  by  this,  then  It  probably  needs  some  redesign 
anyway. 

With  regard  to  the  second  point.  It  Is  Intended  to  extend  SAVANT  Into  the  acyclic  regime,  and  Mr. 
Martin's  suggestion  Is  gratefully  noted. 


REFERENCE  NO.  Of  PAPER:  III-ll 

DISCIISSOR'S  NAME:  Jim  McCuen,  Hughes  Aircraft 

AUTHOR'S  NAME:  Tony  Callaway 

COMMENT:  Can  SAVANT  be  modified  and  expanded  to  model  a  contention  protocol,  high  speed  data  bus 
operating  at  50  megabits? 

AUTHOR'S  REPLY:  SAVANT  Includes  resettable  parameters  -  for  example,  the  transmission  bit  rate,  word 
overheads,  message  overheads,  word  length,  and  message  length.  These  can  be  changed  to  any  values 
characteristic  of  the  protocol  one  wishes  to  Investigate.  For  example,  we  have  used  It  at  RAE  to 
estimate  traffic  on  a  50-megabit  time  slot  bus  protocol. 


REFERENCE  NO.  Of  PAPER:  III-12 
DISCUSSOR'S  NAME:  K.  Brammer,  ESG 
AUTHOR'S  NAME:  H.  0.  Whltehouse 

COMMENT:  Are  you  aw  ore  of  any  applications  of  systolic  array  processing  to  nonlinear  optimal  recursive 
filtering? 

AUTHOR'S  REPLY:  Systolic  t.  rays  can  be  used  for  the  computation  and  Inversion  of  the  covariance 
matrices  associated  with  kalman  filtering.  In  the  area  of  nonrecursive  nonlinear  filtering,  systolic 
arrays  whose  elements  can  perform  comparisons  can  be  used  for  rank-order  filtering,  especially  median 
f 1 1 terlng. 


REFERENCE  NO.  OF  PAPER:  III-14 
DISCUSSOR'S  NAME:  H.  P.  Kuhlen,  G.  E. 

AUTHOR'S  NAME:  J.  T.  Martin 

COWENT:  To  be  compatible  with  your  "design  documentation,"  I  would  like  to  know  how  an  "ideal" 
specification  should  look?  More  functional  diagrams  or  the  "old-fashlonod"  Item-by-item  specification? 

AUTHOR'S  REPLY:  It  does  not  really  matter  In  what  format  the  requirement  specification  Is  presented. 
The  Important  thing  Is  to  ensure  that  the  requirement  specification  specified  fully  those  things  that 
you  require.  The  specification  should  net  request  Items  that  are  not  required  but  tend  to  be  put  Into 
the  specification  because  the  customer  has  a  preconceived  Idea  of  what  the  design  should  look  like. 

For  Instance,  If  you  require  a  computer  to  be  able  to  store  data  In  a  non-volatile  store  then  state 
that  in  the  requirement,  do  not  specify  that  a  core  store  should  be  used— that  Is  not  the  requirement 
In  this  case.  Of  course.  If  you  have  a  particular  reason  for  needing  to  use  a  defined  piece  of 
equipment,  for  Instance  so  as  to  provide  compatibility  with  some  other  unit,  then  this  Is  one  of  your 
mandatory  requirements  and  should  be  Included  In  the  specification. 


S.l-2 


REFERENCE  NO.  OF  PAPER:  1 1 1-14 

DISCUSSOR'S  NAME:  DR.  N.  J.  8.  Young,  Ultra  Electronic  Controls 
AUTHOR'S  NAME:  0.  T.  Martin 

COMMENT:  You  have  talked  about  an  aid  to  system  design,  but  not  covered  testing  or  post-development 
modifications  to  the  customer's  specification.  It  Is  our  experience  (In  Ultra  Electronic  Controls 
Ltd.)  that  these  absorb  a  very  high  percentage  of  costs.  Can  you  tell  us  something  of  how  your  system 
design  aid  can  be  applied  to  system  (hardware  am)  software)  testing  and  post-development  modifications, 
and  whether  It  makes  them  easier  or  more  difficult? 

AUTHOR'S  REPLY:  Functional  documentation  helps  you  to  move  from  the  original  system  requirement 
specification  to  the  specification  necessary  for  the  hardware  and  software  required  to  produce  the 
system.  Test  specifications  for  the  system  come  from  the  FD  because  the  FD  describes  In  an  easily 
understandable  form  the  attributes  that  must  be  proven  to  exist. 

If  the  customer's  specification  changes  after  development,  the  functional  documentation  lower 
levels  can  be  used  to  discover  whether  the  change  to  the  specification  will  affect  the  software  ur  the 
hardware  or  both  and  also  help  to  choose  between  a  change  to  the  software  or  a  change  to  the  hardware 
If  a  choice  exists. 

As  stated  above,  FD  Is  used  In  '•he  overall  system  design  phase,  those  attributes  which  the  system 
must  exhibit  are  defined,  data  structures,  processing  functions,  and  crew  actions  to  meet  this  defined 
system  requirement  are  detailed  on  the  FD,  If  required,  to  make  the  system  function  In  the  required 
manner,  otherwise  they  are  left  to  the  hardware,  software,  or  user  design  stages. 

Functional  documentation  does  produce  a  good  Interface  between  the  designer  and  the  customer, 
because  It  Is  so  understandable.  However,  It  also  allows  the  design  to  proceed  In  a  logical  top  down 
manner  and  thus  offers  all  the  other  advantages  listed  In  the  paper. 


REFERENCE  NO.  OF  PAPER:  1 1 1-14 

DISCUSSOR'S  NAME:  J.  P.  Quemard,  Electronlque  Marcel  Dassault 
AUTHOR'S  NAME:  J.  T.  Martin 

COMMENT:  Trols  remarques  sur  la  m^thode  pr£sent£e 

-  pas  d1 aspect  structure  des  donn£es 

-  pas  de  regroupement  fonctlonnel  des  traltements 

-  pas  de  gestlon  de  references  cro1$6es,  de  chromogrammes 

N'est  ce  pas  slmplement  une  fasjon  de  presenter  une  documentation  pour  le  client  plutfit  qu'une 
m£thode  de  travail.  1 

Three  remarks  on  the  method  presented: 

-  not  mentioning  the  data  structure  aspect 

-  not  mentioning  the  processing  functional  regrouping 

-  not  mentioning  the  cross-reference  management,  chromograms 

Isn't  Is  merely  a  method  to  present  a  documentation  to  the  customer /user  rather  than  a 
working  method? 

AUTHOR'S  REPLY:  The  response  to  this  question  Is  Included  In  the  response  to  Dr.  Young's  question. 


REFERENCE  NO.  OF  PAPER:  III-14 
DISCUSSOR'S  NAME:  Dr.  van  Keuk,  AY P  Member 
AUTHOR'S  NAME:  J.  T.  Martin 

COWENT:  I  would  agree  with  you  that  In  the  early  phase  of  system  design  you  may  not  need  any  computer 
assistance.  The  main  reason  may  be  that  many  of  the  Ideas  to  be  Invoked  ere  unsharp.  But,  of  course, 
In  the  software  design  phase  computer  assistance  In  large  systems  Is  necessary  to  define  the  data 
organization,  of  positioning,  and  things  like  these. 


S3- 


AUTHOR'S  REPLY:  The  question  asked  was  whether  the  use  of  Interactive  computer  display  techniques 
would  be  useful  for  producing  the  functional  documentation  designs.  The  answer  Is  -  not  at  all.  When 
the  FO  diagrams  are  produced,  each  diagram  Is  the  comolned  effort  of  a  number  of  people  working 
together  using  pencils  on  a  piece  of  paper.  The  Initial  diagram  produced  Is  very  rough,  not  even 
rulers  are  used  to  draw  the  lines.  When  the  diagram  has  been  produced  by  .he  engineer  It  Is  redrawn  by 
a  technical  author  but  this  Is  a  small  task. 

The  paper  concerned  Itself  with  only  system  design.  Functional  documentation  Is  used  at  the 
system  design  stage.  Other  techniques  are  used  for  software,  hardware,  and  user  designs.  I  would  just 
briefly  state  that  computer-aided  design  techniques  and  simulation  are  useful  In  all  the  three  areas. 


15-1 


A  CONSISTENT  APPROACH  TO  THE  DEVELOPMENT  OF 
gYgfgg  REftOtRfcMENfrs  AND  SOFTWAr£  "6feSI~dN 

A.  0.  Ward 

British  Aerospace  Public  Limited  Company 
Aircraft  Group 
Warton  Division 
Preston 
PR4  1AX 
United  Kingdom 


SUMMARY 


Some  of  the  problems  encountered  in  the  development  of  system  and  software 
requirements  are  discussed  and  generalised  solutions  suggested.  A  specific  approach  is 
described,  the  SAFRA  Project,-  including  extensions  into  the  area  of  software  design. 

This  approach  embraces  the  use  of  a  new  methodology,  Controlled  Requirements  Expression 
(CORE)  interfaced  with  a  computer  based  System  Description  Language  for  storage  and 
automatic  analysis.  Software  design  assumes  the  use  of  a  MASCOT  rationalised  executive 
and  CORAL  as  the  implementation  language.  Experimental  procedures  for  the  automatic 
extraction  of  CORAL  programmes  from  detailed  requirements  held  on  a  database  are  discussed. 

The  techniques  are  illustrated  via  an  example  based  on  the  processing  associated  with 
a  Fuel  Management  System. 


1.  INTRODUCTION 
1. 1  Problem  Areas 


During  the  latter  half  of  the  previous  decade  there  was  a  growing  awareness  of 
the  importance  of  the  requirements  for  embedded  software,  particularly  for  large 
projects  with  lifecycles  extending  over  many  years.  Two  of  the  major  problems  are 
the  desire  for  realisation  leading  t.0  insufficient  resources  being  allocated  to 
the  requirements  phase  and  the  apparent  i*- ability  to  communicate  the  requirement 
effectively  to  the  implementer. 

A  mora  specific  case  for  examining  the  way  in  which  requirements  are  developed  may 
be  made  by  noting  that  the  cost  of  changes  to  software  increase  over  the  lifecycle 
particularly  when  many  of  the  errors  which  precipitate  such  changes  may  be  traced 
to  inadequacies  Jn  requirements  and  design.  Also,  traditionally  a  relatively  small 
percentage  of  the  procurement  budget  is  devoted  to  requirements  and  so  a  large 
ahsolute  increase  in  the  resources  devoted  to  requirements  will  lead  to  a 
relatively  small  percentage  increase  in  the  overall  cost  of  a  project.  These  three 
pieces  of  evidence  suggest  that  increased  investment  in  the  early  stages  of  projects 
involving  embedded  software  will  potentially  have  a  large  cost  leverage  on  their 
success.  Unfortunately  this  is  difficult  to  achieve  because  although  funding  can 
usually  be  found  to  solve  critical  problems  just  prior  to  entering  service  it  is 
hard  to  convince  people  of  the  worth  of  investment  early  on  in  projects. 

There  are  a  number  of  areas  in  which  requirements  can  be  improved,  some  outside 
the  scope  of  this  paper,  chose  addressed  by  the  technique  described  here  are 
discussed  below. 

A  heavy  reliance  on  the  use  of  a  natural  language  invariably  leads  to  ambiguity. 
English,  although  semantically  a  very  rich  language,  is  notoriously  open  to 
interpretation.  Imposing  a  detailed  format  on  a  requirement  document  goes  some 
way  towards  alleviating  this  problem  but  if  the  detail  itself  is  communicated 
using  English  the  problem  will  still  remain. 

Projects  of  reason-tole  size  will  inevitably  lead  to  the  requirements  phase  being 
undertaken  as  a  team  activity  and  this  in  turn  causes  problems  when  attempts  are 
made  to  assess  the  consistency  of  the  various  documents  produced.  Again,  the  use 
of  English  and  the  lack  of  any  detailed  structure  prevent  the  use  of  formal 
methods  to  check  for  consistency.  The  many  stages  involved  in  current 
Implementations  not  only  make  it  difficult  to  demonstrate  conformance  but  increase 
the  risk  of  errors  due  to  the  many  communication  interfaces  that  have  to  be 
crossed.  Finally,  requirements  are  usually  incomplete  and  as  suggested  earlier 
this  is  usually  due  to  insufficient  effort  rather  than  a  lack  of  methodology  or 
formalism.  However,  there  are  3ome  clases  of  information  which,  due  to  the 
conventional  form  of  requirements,  it  is  almost  impossible  to  check  for 
completeness.  In  addition,  current  approaches  do  not  have  the  mechanism  for 
accommodating  viewpoints  which  although  seemingly  irrelevant  at  the  early  stages 
of  a  project  will  become  very  relevant  once  the  system  approaches  service. 


15-2 


1.2  General  Solutions 


These  problem  areas  we  helieve  can  be  addressed  by  the  following  means.  Ambiguity 
can  only  be  solved  by  using  a  precise  method  of  expression,  such  as  the  diagrammatic 
notation  shown  in  Fig.  (1) .  Here,  the  simple  expedient  of  representing  processes 
in  boxes,  data  on  arrows  and  depicting  time  ordering  as  left  to  right  across  the 
paper  provides  an  unambiguous  picture  of  the  relationships  between  the  processes 
and  their  sequence. 


Validation  for  consistency  is  impossible  without  a  visible  information  structure. 
If  the  simple  system  shown  in  Fig.  (1)  is  to  be  described  in  such  a  way  as  to 
assist  a  consistency  check  then  a  suggested  information  structure  could  be  as 
follows : 


PROCESS: 

PART  OF: 
OSES: 

DERIVES : 
COMES  AFTER: 


Centre  of  Gravity  Calculation; 
Mass  C  of  G  Calibration; 

Individual  Tank  Fuel  Mass; 

Wing  Sweep  Position; 

Fuel  Centre  of  Gravity; 

Mass  Calculation; 


The  reserved  words  in  capitals  are  a  selection  of  specific  Information  categories 
to  which  have  been  assigned  the  objects  and  relationship  shown  in  the  diagram. 
Clearly  the  checking  of  two  descriptions  for  consistency  is  simplified:  by  having 
such  a  structure  and  this  is  illustrated  in  Fig.  (2)  where  two  inconsistent 
descriptions  of  the  functions  depicted  in  Fig.  (1)  are  given  and  some  of  the 
errors  highlighted.  Because  of  its  mechanistic  nature  this  process  is  amenable 
to  automation  provided  we  have  access  to  a  language  which  can  be  used  to  describe 
the  structure  and  a  database  in  which  to  hold  the  information. 


Conformance  can  only  be  preserved  and  demonstrated  if  the  requirement  has  a 
structure  which  allows  this.  Such  a  structure,  of  course,  will  correspond  to  the 
various  stages  of  development  as  well  as  the  detailed  steps  through  each  stage 
and  the  documentation  produced.  A  good  analogy  is  a  notional  aircraft  drawing 
scheme.  Here  a  general  arrangement  (GA)  will  be  originated  from  which,  at  the 
next  level,  some  features  will  be  represented  in  a  little  more  detail  via  a 
feature  G.A.  The  latter  will  be  broken  down  into  assemblies  and  in  turn  to 
sub-assemblies ,  the  final  stage  being  a  detailed  part  which  can  be  handed  to  the 
implementer  for  manufacture.  One  can  observe  that  there  are  several  levels  of 
detail  and  at  each  level  the  customer  and  designer  assess  in  turn  whether  the 
design  1b  practicable,  will  satisfy  the  requirement  and  if  it  ic  correct.  The 
hierarchy  of  Information  that  each  drawing  level  represents  can  be  seen  to  be  a 
logical  decomposition  of  the  preceding  levels  and  there  is  an  unambiguous  method 
of  expressing  the  design  (i.e.  a  drawing  system  with  standards).  In  addition  the 
interrelationships  between  various  levola  and  drawings  at  the  same  level  are 
referenced  on  the  diagrams. 

When  applied  to  the  development  of  system  and  software  requirements  it  is  clear 
that  such  an  approach,  if  applied  rigorously,  would  enable  conformance  to  be 
established  via  a  series  of  small  Increments  of  detail.  Equally,  the  effect  of 
changes  to  the  requirement  can  be  quickly  traced  through  the  hierarchy  in  order 
to  establish  the  functions  affected  by  such  a  change. 

Finally,  completeness  is  satisfied  mainly  by  the  allocation  of  adequate  resources 
and  sufficient  time  to  the  requirements  phase,  however,  ac  stated  earlier 
mechanisms  for  partitioning  work  with  complete  interfaces  in  a  team  activity 
must  be  sought. 

A  three  element  solution  to  the  derivation  of  requirements  which  will  alleviato 
t'ieue  problems  is  provided  by  the  use  of  a: 

•  Methodology 

•  Standard 

•  Automated  Aid 

and  we  will  discuss  these  briefly  in  turn. 

The  methodology  should  encompass  the  process  by  which  requirements  are  both 
developed  and  expressed.  It  must  be  usable  by  engineers,  as  opposed  to  systems 
analyais  in  the  traditional  sense,  and  have  a  notation  which  is  not  only  simple 
to  use  out  is  relatively  transparent  to  the  technical  content.  The  latter  is 
important  from  the  customers  point  of  view  where  it  should  not  be  necessary  for 
him  to  have  a  detailed  understanding  of  the  methodology  in  order  to  undertake 
technical  reviews  of  the  documentation  produced.  It  should  be  applicable  to  any 
stage  of  system  conception  as  far  as  the  customer  will  allow.  It  should  impose 
no  constraints  on  design  decisions  but  rather  provide  the  necessary  cues  to  the 
engineer  so  that  such  decisions  are  made  at  the  appropriate  level  of  detail  at 
the  correct  time. 


15-3 


The  standard  should  provide  the  information  structure  and  the  tests  to  be  made 
against  such  a  structure.  These  quality  control  actions  should  be  matched  by 
rigorous  configuration  control  procedures.  Automated  aids  to  the  application  of 
the  standard  via  the  use  of  computer  based  tools  to  implement  the  mechanistic 
aspecte  of  the  quality  control  procedures  should  have  a  minimal  impact  on  the 
prime  method  of  expression  used  by  the  engineer.  Where  the  tests  cannot  be 
carried  out  automatically,  reports  should  be  provided  which  give  maximum  assistance 
to  the  engineer  in  checking  his  requirement.  One  should  also  aim  to  minimise  the 
data  preparation  task.  The  important  elements  of  such  a  tool  are  shown  in  Fig.  (3). 

The  notation  used  for  expression  is  described  using  a  System  Description  Language 
(SDL)  via  the  information  structure  specified  in  the  standard.  Once  the 
requirement  io  in  this  form  it  may  be  checked  using  an  Input  Analyser  which  not 
only  assesses  the  validity  of  the  input  in  its  own  right  but  also  its  consistency 
with  information  already  held  on  the  database. 

Once  held  on  the  database  it  may  be  subjected  to  the  repertoire  of  reports 
available  for  analysis  or  interfaced  with  other  tools. 

1.3  Software  Design  Interface 

The  software  design  interface  is  yet  another  barrier  to  successful  communication 
of  needs  and  we  should  seek  to  minimise  discontinuities  by  aiming  for  a  more 
gradual  transition  between  requirements  and  implementation.  If  possible  the 
notation  and  methodology  used  in  the  requirements  phase  should  be  consistent  with 
those  used  during  design.  Also,  if  use  is  made  of  a  rationalised  (and  ideally 
standard)  executive  to  specify  software  module  communication  and  control  it  should 
be  possible  to  produce  a  more  formal  route  map  between  requirements  and  basic 
design. 

Similarly  if  a  standard  Higher  Order  Language  (HOL)  is  employed  then  this  argument 
can  also  be  used  to  justify  a  similar  formalism  between  detailed  requirements  and 
the  subsequent  software. 

In  the  next  section  we  will  attempt  to  describe  a  specific  approach  to 
requirements  and  design  which  it  is  hoped  goes  some  way  towards  satisfying  the 
above.  Some  aspects  of  the  approach  are  still  experimental  and  these  will  be 
highlighted  in  the  discussion. 

2.  SPECIFIC  APPROACH 

2.1  SAFRA  Project 

A  specific  approach  to  requirements  and  software  design  is  suggested  by  the  SAFRA 
project.  Semi  Automated  Functional  Requirements  Analysis  (SAFRA)  is  a  proposed 
approach  to  requirements  and  software  design,  consisting  of  existing  methods  and 
tools  and  new  ones  currently  under  development.  A  more  detailed  picture  of  the 
background  to  SAFRA  and  its  initial  objectives  and  assumptions  can  be  found  in 
Ref,  (1).  If  we  consider  a  phased  life  cycle  approach  (Fig.  (4))  then  what  is 
proposed  is  a  consistent  set  of  methods  and  tools  for  each  phase.  These  will  be 
described  below  in  as  much  detail  as  this  paper  will  allow  but  with  reference  to 
the  discussion  above  they  may  be  summarised  as  follows. 

The  methodology  used  by  the  engineer  to  develop  and  express  the  requirement  is 
Controlled  Requirements  Expression  (CORE).  This  is  a  new  technique  developed 
jointly  by  B.Ae.  Warton  Division,  and  Systems  Designers  Ltd.,  embracing  a  method 
for  the  assembly  and  analysis  of  information  relevant  to  a  requirement,  and  an 
easily  understood  diagrammatic  notation  as  the  method  of  expression. 

The  automated  aid  to  validation  and  storage  of  both  requirement  and  software  design 
is  the  University  of  Michigan's  Problem  Statement  Language  and  Problem  Statement 
hnalyser  (PSL/PSA) . 

The  software  design  interface  assumes  the  continuing  use  of  CORE  notation  to 
produce  detailed  specifications  with  Btorage  using  PSL/PSA  but  aimed  at  the  use 
o.  a  rationalised  executive  and  HOL.  The  former  is  the  Modular  Approach  to 
Software  Construction  Operation  and  Test  (MASCOT)  and  the  latter  is  the  UK  MoD 
standard  CORAL  66.  A  further  assumption  is  the  use  of  a  commercial )y  available 
MASCOT  based  software  development  and  test  environment  for  the  testVrg  phase 
working  on  the  host-target  principle. 

2 . 2  Controlled  Requirements  Expression 

CORE  is  a  method  of  analysing  and  expressing  requirements  in  a  controlled  and 
oreclse  manner.  It  enables  a  subject  requirement  to  be  expressed  as  either  a 
number  of  lower  level  requirements  or  as  a  component  part  of  some  higher  level.  Any 
lower  level  requirement  derived  using  CORE  may  in  turn  be  subjected  to  the  method  to 
produce  a  hierarchy  of  lower  levels.  The  lowest  level  is  that  at  which  the  full 
method  need  no  longer  be  applied  and  one  may  resort  to  strictly  hierarchical 
decomposition  making  use  of  the  notation  alone.  This  ip  considered  to  occur  after 
the  basic  design  stage  has  taken  place.  In  general  the  same  notation  is  employed  at  all 
levels  of  requirement  and  design  and  some  of  the  major  symbols  are  illustrated  in  Fig.  (5)  . 


15-4 


CORE  diagrams  utilise  boxes  to  represent  processes  and  arrows  to  represent  data. 

The  diagrams  are  time  ordered  from  left  to  right  and  thus  the  box  ordering 
specifies  the  sequence  in  which  processes  must  occur.  Symbol  free  boxes  shown  in 
parallel  indicate  indeterminate  order  and  overlapping  indicates  a  number  of 
identical  processes  occurring  in  parallel.  All  input  data  entering  a  CORE 
diagram  is  referenced  to  a  source  and  all  output  data  to  a  destination. 

Data  arrows  may  also  be  used  to  describe  repetition,  selection  and  condition. 

Those  arrows  appearing  from  the  top  oi"  a  process  box  point  to  a  reference  which 
indicates  that  this  process  is  functionally  equivalent  to  the  one  described  nt 
the  reference.  Those  appearing  at  the  bottom  of  a  process  box  Indicate  the 
mechanism  that  performs  the  process.  Iteration  is  shown  by  an  asterisk  in  the  vop 
right-hand  comer  of  a  process  box  and  mutual  exclusion  by  a  small  circle  in  the 
top  left-hand  side. 

The  method  comprises  eleven  logical  steps  which  when  applied  to  a  subject 
requirement  will  decompose  it  into  its  lower  level  components  and  these  aro 
summarised  below. 

The  method  has  three  stages  for  each  level  of  decomposition  which  may  be  summarised 
as 


•  Information  Gather xug 

•  Propose  Relationships 

•  Prove  Relationships 

Information  is  gathered  with  respect  to  a  number  of  subdivisions  of  the  problem, 
referred  to  as  Viewpoints,  in  terns  of  input  and  output  data  and  gross  tu notions. 
This  information  is  refined  by  a  Data  Decomposition  Step  which  specifies  in  more 
detail  the  data  already  tabulated. 

Relationships  are  proposed  between  the  inputs  end.  outputs  for  each  Viewpoint  in 
turn  and  for  data  flowing  across  the  Viewpoint,  and  these  are  termed  'Single 
threads' ,  The  proof  of  such  relationships  is  done  in  two  ways;  first  the 
interrelationship  between  Viewpoints  is  examined  and  where  such  links  exist  new 
diagrams  in  the  form  of  ‘combined  threads'  are  drawn.  Secondly,  as  threads 
represent  only  particular  paths  through  system  operation  and  in  no  way  depict 
such  aepects  as  parallelism  or  the  operational  time  ordering  of  processes 
another  diagram  is  required  which  will  illustrate  this.  This  1b  achieved  by  the 
construction  of  a  'combined  operational'  diagram  or  examining  how  threads  interact 
operationally  across  Viewpoints. 

Both  of  these  will  lead  to  iterations  through  the  previojs  steps  precipitating  a 
more  detailed  examination  of  the  single  threads  for  correct  combination  and  in 
order  to  establish  operational  relationships. 

The  outcome  of  the  above  is  a  partitioned  description  (in  terms  of  Viewpoints) 
with  a  detailed  and  hopefully  complete  picture  of  how  the  Viewpoints  interrelate 
and  react  with  each  other  as  well  as  some  indication  as  to  the  major  functions 
contained  within  them.  It  is  now  possible  to  extract  the  Subject  Viewpoint  as 
the  one  of  interest  and  in  turn  take  Viewpoints  which  describe  how  it  is  composed 
and  repeat  the  methodology  in  full. 

Such  decompositions  continue  until  functions  emerge  which  may  be  seen  to  i>« 
implemented  as  software  on  a  particular  processor,  once  decisions  have  been  made 
regarding  the  computing  elements.  At  this  stage  it  1b  possible  to  enter  into 
basic  design,  but  before  discussing  this  phase  we  must  say  a  little  about  MACCOT. 

2 . 3  MASCOT 

The  definition  of  MASCOT  given  in  Fef.  (2)  describes  it  as  a  set  of  facilities 
for  real  time  programming  incorporating  features  concerned  with  systems 
development  and  construction,  achieving  this  by  providing  the  following; 

(a)  A  formalism  for  expressing  the  eoftwara  structure  of  a  multi  programmed  or 
real  time  system  which  can  be  Independent  of  computer  configuration  and 
programming  language. 

(b)  A  disciplined  approach  for  design,  implementation  and  testing  which  provides 
a  concept  of  modularity  for  real  time  systems  and  added  reliability  brought 
about  by  increased  control  over  access  to  data. 

(c)  An  interface  to  support  the  implementation  and  testing  methodologies  which 
is  provided  by  a  small  kernel  that  can  elthev  be  implemented  directly  on  a 
bare  machine  (for  operational  use)  or  on  top  of  a  host  operating  system 
(useful  for  system  prototypes)  as  well  as  software  construction  facilities. 

(d)  A  strategy  for  documentation. 


15-5 


MASCOT,  as  Ref.  (2)  continues,  views  an  application  system  as  a  number  of 
activities,  or  processes,  which  operate  independently  and  asynchronously,  but 
which  cooperate  by  accessing  shared  Intercommunication  Data  Areas  (IDAs) .  Thus 
the  system  can  be  seen  as  a  network  whose  nodes  are  the  activities  and  the  IDAs 
•■jhose  directed  links  are  pathways  for  data  flow  between  activities  and  IDAs. 

Although  the  MASCOT  facilities  allow  great  variety  in  the  implementation  of  IDAs, 
it  has  been  found  useful,  for  design  purposes,  to  distinguish  two  conceptual 
classes  of  IDA  according  to  the  nature  of  the  data  flow  which  they  support.  These 
are  called  channels  and  pools.  A  channel  supports  unidirectional  data 
transmission,  it  has  an  input  interface  associated  with  a  number  of  producer 
activities  and  an  output  interface  associated  with  a  number  of  consumer  activities. 
A  pool  provides  a  permanent  data  area  in  which  data  remains  available  for 
activities  to  read  until  it  is  specifically  overwritten.  The  data  in  a  pool  does 
not  have  the  essential  transcience  of  channel  data  and  reading  it  does  not  imply 
consumption,  conceptually  it  has  a  simple  bi-directional  interface  wich  associated 
activities. 

MASCOT  system  designs  are  represented  by  Activity-Channel-Pool  (ACP)  diagrams  and 
Pig.  (6)  shows  the  symbology  along  with  a  simple  example.  Clearly,  the  logical 
outcome  of  a  basic  design  phase  using  MASCOT  would  be  a  set  of  ACP  diagrams  showing 
the  identified  activities  and  how  they  are  related  through  appropriate  IDAs. 
Integrating  a  requirements  phase  using  CORE  with  a  basic  design  phase  using  MASCOT 
specifically  means  changing  a  requirement  (a  CORE)  diagram  into  a  design  (an  ACP) 
diagram.  This  area  of  methodology  is  still  in  the  very  early  stages  of  development 
but  it  is  possible  to  postulate  two  possible  approaches  to  this  step. 

.  A  software  designer  takes  the  CORE  diagram  as  a  statement  of  the  requirement 
and  by  considering  the  constraints  of  processor  throughout,  memory  available 
etc.,  produces  what  he  views  as  an  optimum  basic  design  in  the  form  of  an  ACP 
diagram.  The  design,  while  reflecting  the  software  architecture,  will  retain 
the  functional  relationships  specified  in  the  requirement  and  because  of  the 
structural  method  of  expression  in  use  it  will  be  possible  to  demonstrate 
that  the  basic  design  conforms  to  the  needs  of  the  requirement. 

•  The  second  ai.d  perhaps  controversial  approach  is  to  draw  parallels  between  data 
relationships  in  the  CORE  sense  and  those  between  activities  in  MASCOT.  In 
short,  propose  a  direct  correspondence  between  a  CORE  diagram  and  an  ACP 
diagram  and  thus  minimise  the  software  design  step  in  the  traditional  sense. 

One  might  suggest  that  this  approach  is  only  feasible  where  performance 
constraints  do  not  exist. 

However,  let  up  assume  that  by  either  means  the  design  diagram  has  been  produced, 
subsequent  seeps  consist  of  further  decomposition  of  each  activity  or  software 
function  making  use  of  CORE  notation.  A\  each  ’layer’  of  this  detailed 
description  the  diagrams  are  encoded  in  PSL,  checked  and  stored  or.  the  database. 

Tne  terminating  layer  is  one  where  the  diagrams  reflect  logic  which  is  directly 
transcribable  to  a  programming  language,  in  this  case  CORAL,  however  the 
transition  is  done  automatically  by  use  of  a  specially  designed  suite  of  PSA 
reports  and  a  formatter. 

2.4  PSL/PSA 

This  topic  has  been  left  until  now  so  as  not  to  break  the  continuity  of  the 
discussion  on  the  transition  from  requirements  to  design.  PSL/PSA  is  a  System 
Description  Language,  analyser  and  database  developed  as  part  of  the  ISDOS  project 
at  the  University  of  Michigan.  The  reader  is  referred  to  Ref.  (3)  for  a  discussion 
of  its  background  and  a  description  of  the  more  important  features.  In  the 
context  of  SAFRA,  PSL/PSA  is  being  employed  in  two  specific  ways. 

Conventions  have  been  established  for  encoding  particular  subsets  of  CORE  data 
sets  (i.e.  Tabular  Entries,  Combined  Threads  etc.)  into  PSL  and  running  suitas 
of  reports  to  check  their  correctness.  Such  passes  are  part  of  the  quality 
control  procedures  demanded  by  the  standard.  The  second  area  relates  to  the 
detailed  specification  of  software  functions  at  the  activity  level  and  below,  in 
order  to  produce  a  database  of  the  specification  which  may  then  be  used  to 
automatically  generate  the  programme  i  which  satisfy  the  root  procedure  that 
corresponds  to  the  activity.  PSL  consists  of  a  large  number  of  reserved  words 
pertinent  to  particular  aspects  of  system  description,  these  are  summarised  below 
and  examples  of  those  relevant  to  processes  in  particular  are  given  in  Section  3.4. 

-  Communication  and  analysis 

-  System  boundary  and  Input/output  flow 

-  System  structure 

-  Data  structure 


Data  derivation  and  manipulation 


15-6 

-  System  size  And  volume 

-  System  control  and  dynamics 

-  Project  management 

To  complement  the  language  there  is  a  suite  of  32  reports  available,  each  one 
related  to  the  aspects  given  above.  The  reports  fall  into  four  categories. 
Indented  Lists,  Matrix,  Structure/Chain  and  Function  Flow  and  some  simple 
examples  are  shown  in  Fig.  (7) . 


3.  ILLUSTRATION  VIA  AN  EXAMPLE 
3 . 1  Introduction 

The  examples  given  below  are  the  result  of  two  separate  studies  addressing 
requirements  and  design  as  separate  entitles  but  for  convenience  they  are 
presented  here  as  the  result  of  a  contiguous  series  of  phases.  This  obtains 
because  they  are  the  result  of  two  separate  development  phases  for  the 
methodology  dealing  initially  with  requirements  and  then  addressing  the 
interface  with  a  software  design  phase  and  the  production  of  programmes. 

The  example  shown  here  is  a  Fuel  Management  System  (FMS) ,  or  specifically  the 
processing  associated  with  an  FMS,  and  we  will  now  say  a  few  words  about  this 
requirement . 

The  customer  input  was  the  hardware  system  layout  diagram  shown  in  Fig.  (8)  and 
the  need  was  to  generate  the  requirement  for  an  associated  control  system.  The 
design  aim  was  to  produce  an  automatic  FMS  to  reduce  pilot  workload.  It  should 
have  the  capability  of  initiating  the  normal  transfer  sequence  but  should  also  be 
able  to  recognise  faults  and  reconfigure  the  transfer  sequence  accordingly. 

The  agreed  assumptions  for  the  system's  starting  point  were  therefore: 

•  Twin  engine  aircraft 

•  Six  fuel  tanks  -  Forward  Fuselage 

-  Rear  Fuselage 

-  Left  Wing 

-  Right  Wing 

-  Left  External 

-  Right  External 

.  Transfer?  all  tanks  shall  be  capabla  of  transferring  fuel  to  forward  and  rear 
tanks . 

■  Asymmetry?  fuel  asymmetry  shall  be  automatically  controlled  to  provide  constant 
asymmetry  between  forward  and  rear  tanks. 

•  Information  should  be  provided  to  enable  the  ground  crew  to  service  the 
aircraft  via  a  Ground  Service  panel. 

.  Single  failure  uh’ill  not  be  catastrophic. 

•  Sufficient  information  shall  be  made  available  to  tha  crew  to  enable  interaction 
if  and  when  required. 

•  System  shall  perform  automatically  and  provide  self  diagnosis  and  seif 
correction  capability. 

3 . 2  Development  of  Requirements 

The  requirement  was  developed  through  two  levels  of  decomposition.  Level  3  and  its 
associated  layers  transcend  the  traditional  interface  of  software  design  although 
in  this  approach  it  is  seen  as  a  continuing  decomposition,  of  the  requirement  in 
order  to  produce  a  language  independent  description.  However,  for  convenience  level 
3  will  be  discussed  separately.  A  schematic  representation  of  the  hierarchy  is 
shown  in  Fig.  (9).  The  objective  was,  by  use  of  the  methodology  to  first  establish 
the  interaction  between  the  complete  FMS  and  other  systems  and  then  at  a  subsequent 
level  establish  the  interaction  between  the  processing  element  of  the  FMR  and  other 
parts  of  the  subsystem.  The  thirr'  level,  then,  corresponds  to  a  detailed 
description  of  the  functions  satisfied  by  the  FMS  processing  element. 

The  Viewpoints  selected  at  level  1  were  as  follows: 

•  Allied  Command?  the  actions  external  to  the  aircraft  of  providing  fuel,  mission 
and  co-ordination  data  etc. 


15-7 


•  Other  Aircraft  Systems;  those  which  have  an  influence  on  or  an  area  influenced 
by  the  fuel  system  (i.e.  cooling  Bystems,  engine  systems  etc.). 

•  Environment  in  which  the  aircraft  operates. 

•  Fuel  Management  Syut:»m,  embracing  pumps,  valves,  pipes,  tanks,  processors, 
embedded  software  etc. 

The  .Viewpoints  selected  at  level  2  were  as  follows: 

•  Data  Highway,  the  method  of  transferring  the  data  from  sensors  to  the  processor 
and  from  the  processor  to  controls  and  displays  etc. 

•  Controls  and  Displays,  the  pilot  ,\nd  ground  service  panels. 

•  Fuel  Management  System  Process,  all  management  and  control  functions  to  be 
embodied  in  software. 

■  Fuel  Handling,  hardware  such  as  pumps,  valve,  pipes,  tanks  etc. 

Some  brief  examples  of  the  CORE  methodology  will  now  be  discussed.  Fig.  (10) 
shows  two  aspects  of  information  gathering  associated  with  the  second  level  of 
decomposition.  A  Tabular  Entry  for  the  Viewpoint  of  FMS  Processing  is  shown 
with  a  decomposition  of  the  data  passing  between  the  FMS  Processing  and  Fuel 
Handling  Viewpoints.  The  additional  level  of  detail  in  the  latter  allows  the 
former  to  be  examined  and  a  number  of  actions  associated  with  the  data  identified. 
The  proposed  threads,  an  example  of  which  is  shown  in  Fig.  (11),  are  identified 
for  al).  the  data  listed.  This  particular  thread,  In  Flight  Refuel  Control 
Actions,  is  given  for  the  Viewpoint  FMS  Processing  and  the  data  links  identified 
interface  with  the  other  three  viewDOints  as  well  as  that  data  identified  at  the 
previous  level  as  flowing  across  the  problem  boundary.  The  interaction  between 
this  proposed  thread  in  the  Fuel  Processing  Viewpoint  and  that  of  other  threads 
in  other  viewpoints  is  now  examined  by  considering  these  data  relationships. 

When  particular  or  thread  relationships  can  be  found  a  Combined  Thread  diagram 
may  be  constructed  as  shown  in  Fig.  (11).  Here,  two  threads  proposed  in  the  FMS 
Processing  Viewpoint  are  shown  interacting  with  In  Flight  Refuel  Valves  Actions 
in  the  Fuel  Handling  Viewpoint,  data  passing  into  and  out  of  the  diagram 
demonstrate  the  relationships  with  other  threads. 

These  diagrams  have  been  drastically  simplified  from  the  original  documents  in 
order  to  help  communicate  a  feeling  for  the  methodology,  they  also  represent  a 
very  small  sample  from  a  three  volume  data  set.  The  final  step  is  the  construction 
of  the  Operational  diagram  (Fig.  (13)). 

3  Basic  Design 

As  stated  in  2.3  the  basic  design  phase  consists  of  producing  a  design  diagram 
from  the  CORE  requirement  diagram  either  as  a  specific  software  design  activity 
(i.e.  with  recourse  to  optimisation)  or  by  establishing  a  direct  correspondence 
between  the  two  diagrams  via  their  data  relationships.  For  interest  we  will 
discuss  the  latter  approach,  while  bearing  in  mind  that  it  may  produce  a  less 
than  optimal  solution,  and  consider  the  CORE  Operational  diagram  Bhown  in  Fig. 

(13)  . 

There  are  similarities  between  some  of  the  data  relationships  used  in  CORE  and 
those  assumed  in  the  use  of  MASCOT.  Specifically  we  believe  there  is  a 
correspondence  between  what  are  termed  Thread  and  Associated  Thread  relationship 
in  the  CORE  domain  and  channels  and  pools  in  the  MASCOT  sense.  The  Operational 
diagram  may  be  redrawn  with  the  appropriate  notation  for  channels  and  pools  once 
the  configuration  of  specific  pools  has  been  decided.  Such  a  conversion  is 
shown  in  Fig.  (12)  and  corresponds  to  a  CORE  Design  diagram.  In  principle  this 
differs  from  a  MASCOT  ACP  diagram  by  having  activities  in  rectangles  rather  than 
circles . 

4  retailed  Design 

The  consequence  of  the  previous  steps  are  a  number  of  software  functions, 
represented  by  threads,  with  an  operational  view  of  how  these  threads  interact  in 
terms  of  a  requirement.  This  requirement  has  been  used  to  establish  a  Basic 
Design  consisting  of  Activities  and  their  associated  IDAs.  For  simplicity  we  will 
now  consider  one  of  these  Activities,  specifically  MASS-CALCS  and  show  the  steps 
undertaken  to  carry  out  a  detailed  definition  of  the  sequential  process  that 
supports  this  Activity.  In  MASCOT  terminology  this  is  called  a  Root  Specification 
and  the  first  description  of  the  MASS-CALC  Root  Spec  derived  at  level  2  is  shown 
in  Fig,  (14) . 

Further  decomposition  of  the  processes  shown  on  this  diagram  will  not  only  provide 
the  appropriate  macro  expansion  but  also  give  a  more  detailed  breakdown  of  the 
data  structures  associated  with  the  channels  and  poolB.  Such  decomposition  1b  done 
strictly  hierarchically  and  the  only  resort  to  CORE  is  the  use  of  the  diagrammatic 
notation.  Such  a  decomposition  is  a  lengthy  business  and  the  subsequent  tree 
transcends  six  layers.  The  terminating  box  on  each  branch  of  the  tree  corresponds 


to  an  expansion  of  the  macro  referenced  in  the  box.  Layers  above  this 
termination  are  expressed  as  CORE  diagrams  and  the  information  they  contain  is 
encoded  in  PSL  for  both  data  and  process.  The  nature  of  the  information  stored 
in  this  way  is  reflected  in  the  example  shown  below,  via  the  process  CALC  IND 
TANK  FUEL  MASS  part  of  the  MASS  CALCS  ACTIVE  composition  of  Fig.  (14).  Note 
that  words  on  the  left-hand  side  of  listing  corresDond  to  PSL  reserved  words. 


DEFINE  PROCESS  CALC-IND-TANK-FUEL-MASS; 

/*  DATE  OF  LAST  CHANGE  -  JAN  26,  1981,  11:29:25  */ 
SYNONYMS  ARE:  B1P06; 

ATTRIBUTES  ARE: 

REPEAT-RANGE  SET-OF-TANKS , 


ORDER 
TYPE 

SUBPARTS  ARE: 


3, 

' REPEATED-MACRO-CALL ' ; 

CALC -ONE-TANK- FUEL-MASS , 
PUT-ONE-TANK-MASS , 
ADD-TO-RUNNING-TOTAL , 
ONE-PROBE-PRELIM-MASS-CALCS , 
LOOKUP-ATT-CRCTN- FACTORS , 
WRITE- FUSE-TANK- FUEL-MASS ; 
MASS-CALCS-ACTIVE; 


PART  OF: 

CREATES : 

CALC- IND-TANK- FUEL-MASS-LOCAL ; 


DERIVES: 
DERIVES : 
USING: 
EMPLOYS: 
UPDATES : 
USING: 
UPDATES : 
USING 


C 

N 

N 

N 

E; 


IND-TANK-FUEL-MASS; 

USABLE-FUEL-MASS j 
DENSITY-CRCTN-FACTOR; 

BUS-ATTITUDE-DATA , 

TANK-ID; 

TANK- ID; 

TANK-RUNNING-TOTAL; 

TANK -ID, 

TANK-RUNNING-TOTAL j 
INCEPTION-CAUSES : 

LOOKUP-ATT-CRCTN-FACTORS; 

TERMINATION  CAUSES: 

PUT-TOTAL-FUEL-MASS ; 

ON  TERMINATION  OF: 

SET- TANK-RUNNING-TOTAL-ZERO; 

PROCEDURE ; 

'FOR'  TANK  ID  :«=  LWT  'STEP'  1  ’UNTIL'  RFT  'DO' 

CALC  IND  MASSES  (TANK, ID, DCF, GAUGING  DATA, RUNNING  TOT, 
ATTITUDE  DATA, 

IND  MASSES, USABLE  MASS)- 


The  above  may  be  seen  to  consist  of  six  areas: 

.  The  process  name  corresponds  to  that  found  on  the  diagrams  with  an  appropriate 

synonym,  in  this  case  a  keying  index  which  allows  the  process  to  be  traced 

back  to  the  root  of  this  particular  tree,  MASS-CALCS. 

•  Qualities  of  the  process  in  the  form  of  Attributes  may  be  entered  and  the 

examples  given  here  include  the  TYPE  of  process,  a  Repeated  Macro  Call  with 

the  value  of  the  REPEAT-RANGE  given  as  SET-OF-TANKS. 

•  Upward  and  downward  hierarchical  relationships,  via  the  SUBPARTS  and  PART  OF 
terms  show  that  the  macro  has  several  constituent  processes  that  will  be 
defined  at  the  next  layer. 

•  Data  relationships  and  their  specific  significance  are  represented  by: 


USING: 

EMPLOYS: 

DERIVES: 

UPDATES : 


Conventional  utilisation  of  data  local  to  the  diagram  (eg  TANK-ID) . 

Conventional  utilisation  of  data  entering  the  diagram  (eg  GAUGING 
DATA) . 

Production  of  data  which  subsequently  will  leave  the  diagram. 

(e.g.  USABLE  FUEL  MASS). 

Iterated  variable  (e.g.  TANK-RUNNING-TOTAL). 


Sequencing  and  control  relationships,  such  as  the  first  process  to  be  triggered 
within  the  process  CALC-IND-TANK-FUEL-MASS,  signified  by  INCEPTION-CAUSES,  here 
LOOKUP- ATT- CRCTN-FACTCRS.  Similarly  the  process  to  be  triggered  when  this 
sequence  of  processes  is  complete,  signified  by  TERMINATION  CAUSES,  here  PUT- 
TOTAL-FUEL-MASS.  Finally  the  process  whose  termination  will  start  this 
particular  sequence,  signified  by  ON  TERMINATION  OF,  here  SET-TANK-RUNNING-TOTAL- 
ZERO. 


The  last  area,  classifind  as  PROCEDURE,  will  ultimately  be  reserved  for 
mathematical  expression  which  cannot  be  described  using  PSL.  However,  the 
currently  experimental  status  of  the  PSA  report  suite  and  formatter  being 
employed  means  that  some  aspects  other  than  mathematical  statements  must  be 


15-9 


Included  In  CORAL  at  this  time.  The  PROCEDURE  statement  is  a  comment  entry 
on  the  database  and  hence  is  not  amenable  to  being  checked  by  the  analyser. 


3.5  Program  Generation 

Considering  the  MASCOT  structure  there  are  four  types  of  program  required  to 
complete  a  system  and  these  comprise: 

•  Root  Specs ,  the  actual  program  body  for  each  activity  and  which  form  the  bulk 
of  the  system. 

•  IDA  Specs,  (Pools  and  Channels)  which  include  the  body  of  Access  Procedures. 

•  Module  Declarations,  which  list  all  the  components  of  the  above  giving  the 
actual  names  to  be  used  in  the  Form  lists. 

•  Form  Lists  describe  how  the  system  goes  together  ie  which  pools  and  channels 
connect  which  activities  to  form  (via  subsystems)  the  complete  system. 

In  principle  all  the  information  pertinent  to  these  programs  should  be  found  on 
or  derived  from  the  database  but  here  we  will  describe  the  steps  concerned  with  a 
Root  Spec  and  for  simplicity  only  for  the  first  macro. 

The  information  needed  to  construct  the  first  macri  can  be  found  in  the  PSL 
statements  ATTRIBUTES  ARE,  PROCEDURE,  TRUE-WHILE  ar d  FALSE-WHILE.  An  attribute, 
when  applied  to  a  process,  will  have  names  TYPE  anc  ORDER,  and  when  applied  to 
data  items  will  have  names  TYPE  and  CORAL-NAME.  A  process  will  correspond  to  a 
macro  name  and  the  attribute  TYPE  will  thus  describe  the  type  of  macro,  in  this 
case  it  is  ROOT-SPEC  (i.e.  ROOT-SPEC  being  the  VALUE  of  TYPE).  A  data  item  can 
be  an  ENTITY,  GROUP  or  ELEMENT  where  the  latter  corresponds  to  an  indivisable 
CORAL  variable  while  ENTITY  and  GROUP  are  types  of  ELEMENT  collections.  Here 
the  attribute  name  TYPE  describes  the  CORAL  variable  for  the  purpose  of  making 
declarations  within  the  first  macro. 

The  structure  of  a  macro  is  made  up  of  Heading,  Declarations,  Calls  and  Close 
and  the  respective  code  elements  are  found  as  below. 

Heading;  in  the  PROCEDURE  comment  entry  of  the  process  with  the  macro  name, 
the  attribute  name  TYPE  will  have  the  value  ROOT-SPEC. 

•  Declarations;  in  the  attribute  description  ot  the  appropriate  GROUPS  and 
ELEMENTS . 

.  Calls;  in  the  PROCEDURE  comment  entries  of  the  SUBPARTS  of  the  first  macro 
and  the  comment  entries  of  the  CONDITION  section. 

•  Close;  not  in  the  database  and  thus  created. 

The  approach  adopted  is" 

(i)  Identify  the  macro  name 
(ii)  Trace  the  local  variables  from  the  macro  name 
(Hi)  Trace  the  first  layer  subparts  form  the  macro  name. 

Although  this  is  the  route  for  extracting  the  required  Information  the  order 
which  the  call  code  elements  must  take  in  the  macro  cannot  be  guaranteed.  This 
has  been  overcome  by  the  use  of  an  additional  attribute  ORDER  which  provides  the 
appropriate  key. 

The  programme  is  generated  by  applying  the  above  strategy  through  a  suite  of 
PSA  reports,  where  the  results  of  one  report  act  as  the  file  input  to  the  next. 
Ths  last  step  before  submission  to  the  compiler  is  a  Formatter  which  deletes 
extraneous  PSA  messages  accrued  during  the  previous  steps.  An  example  of 
CORAL  generated  in  this  way  is  shown  below  representing  the  macro  CALC-IND-TANK 
FUEL-MASS  and  calls  to  two  layers  below. 

152  'COMMENT'  =--==>MACRO  PSLNAME=CALC-IND-TANK-FUEL-MA^S  ; 

153  'DEFINE' 

154  CALC  IND  MASSES  (TANK  ID, DCF, GAUGING  DATA, RUNNING  TOT, 

155  ATTITUDE  DATA, 

156  IND  MASSES, USABLE  MASS) 

157  "'BEGIN' 

158  'COMMENT'  — — >MACRO  PSLNAME-->CALC-IND-TANK- FUEL-MASS  ; 

159  'FLOATING "ARRAY*  ATT  MASS  El: 37; 

160  'INTEGER'  PROBE; 

161  'INTEGER'  ONE  TANK  MASS; 

162  'INTEGER'  PROBE  MASS; 


15-10 


163  'FLOATING'  ATT  FAC; 

164  LOOK  UP  ATT  FAC  (BUS  ATTITUDE  DATA, TANK  ID, 

165  ATT  FAC)  ; 

166  'FOR'  PROBE:-  1  'STEP'  1  'UNTIL'  3  'DO' 

167  ONE  PROBE  PMC  (PROBE, TANK  ID,  GAUGING  DATA, 

168  ATT  MASS) ; 

169  CALC  ONE  TANK  M  (DCF, ATT  MASS, 

170  ONE  TANK  MASS) > 

171  WRITE  USABLE  MASS (ONE  TANK  MASS, TANK  ID, 

172  USABLE  MASS) ; 

173  PUT  ONE  MASS  (ONE  TANK  MASS, TANK  ID, 

174  IND  MASSES) ; 

175  ADD  TO  RUN  TOT  (ONE  TANK  MASS, RUNNING  TOT); 

176  'END'"; 

177  'COMMENT'  — >MACRO  PSLNAME-LOOKUP-ATT-CRCTN- FACTORS  ; 

178  'DEFINE' 

179  LOOK  UP  ATT  FAC  (BUS  ATTITUDE  DATA, TANK  ID, 

180  AT  FAC) 

181  "'BEGIN' 

182  'COMMENT'  «=«>MACRO  PSLNAME-LOOKUP-ATT-CRCTN-FACTORS  ; 

183  'FLOATING'  P,R,A,B,PITCH  ANGLE, ROLL  ANGLE; 

184  'IF*  TANK  ID  -  LWT  'THEN* 

185  'BEGIN' 

186  GET  ATT  'OF'  DH  IN (BUS  ATTITUDE  DATA); 

187  PITCH  ANGLE: -BUS  ATTITUDE  DATAEO? ; 

188  ROLL  ANGLE: -BUS  ATTITUDE  DATAE1? ; 

189  P (1/COS (PITCH  ANGLE)-1); 

190  R: -(1/COS (ROLL  ANGLE) -1) ; 

191  'END'; 

192  'IF'  TANK  ID  -  LWT  'OR'  TANK  ID  =  RWT  'THEN' 

193  'BEGIN' 

194  'IF'  PITCH  ANGLE  'GE'  O  'THEN'  A:-P  'ELSE'  A:— P; 

195  'IF'  ROLL  ANGLE  'LE'  20  'AND'  ROLL  ANGLE  'GE'-20  'THEN'  B:«0  ; 

196  ATT  FAC : =1+ (A+B) ; 

197  'END' 

198  'ELSE' 

199  'BEGIN' 

200  'IF'  PITCH  ANGLE <=30  'AND'  PITCH  ANGLE>  —  30  'THEN'  A:=0; 

201  'IF'  ROLL  ANGLE<=20  'AND'  ROLL  ANGLE>— 20  'THEN'  B:-R; 

202  ATT  FAC: =1+ (A+B) ; 

203  'END'; 

204  'END'"; 

4.  STATUS 


As  stated  In  the  introduction  the  techniques  described  above  are  3till  in  the 

development  stage  and  the  status  of  particular  aspects  are  given  below. 

•  PSL/PSA,  MASCOT  and  CORAL  are  all  commercially  available  and  mature.  PSL/PSA  has 
been  used  extensively  in  the  United  States  for  the  statement  of  requirements  and 
MASCOT  and  CORAL  have  been  employed  on  a  number  of  real  time  projects  in  the 
United  Kingdom. 

■  CORE  as  a  requirements  methodology,  is  being  used  on  a  number  of  small  projects 
within  BAe  and  its  use  continues  to  grow.  Considerable  effort  has  been  expended 
in  solving  the  transfer  problem  and  an  intensive  training  course  is  available  to 
members  of  new  projects. 

•  Experience  to  date  has  highlighted  the  problem  of  data  preparation  of  PSL  from 
CORE  documentation  as  well  as  the  control  of  the  large  data  secs  produced  by  the 
method.  In  order  to  solve  both  of  these  problems  a  computer  based  CORE  work 
station  is  currently  under  development  which  will  enable  requirements  to  be 
developed  at  a  terminal  and  automatically  produce  the  associated  PSL. 

•  The  links  with  MASCOT  and  CORAL  are  experimental  and  a  project  currently  being 
undertaken  will  seek  to  evaluate  the  conventions  given  above,  as  well  as  nrovidinq 
the  means  to  produce  a  more  powerful  CORAL  generator. 

•  Long  term  plans  include  interfacing  CORAL  with  Ada  including  the  automatic 
generation  of  Ada  programmes. 

5.  REFERENCES 

1.  An  Approach  to  the  Derivation  and  Validation  of  Requirements. 

A.  0.  WARD  AGARDograph  No.  258  Guidance  and  Control  Software.  May  1980. 

2.  MASCOT  a  Structured  Software  Design  Methodology  for  Real  Time  Systems. 

Infotech  Seminar  Nov.  1978. 

3.  PSL/PSA:  A  Computer-Aided  Technique  for  Structured  Documentation  and  Analysis 
of  Information  Processing  Systems. 

D.  Teichrow  and  E.  A.  Hershey  III.  IEEE  Transactions  on  Software 

Engineering.  Jan.  1977. 


CORt/PSl 


15-13 


1  1  *  S  67*901 


11111 

7  3  4 


I  CALC-INO-TANA-HtH-MAiS 
I  C»U-0Nl-T»«-ruU-M»*» 
PUT-ONl-TAHK-MAM 
AM  -TO-RUNNIXA-  TOTAL 
OKI  .R»0*t-RIUUH-MAS$-CALC5 


metis 

IMttll 

noctli 

HOCUS 

HOCUS 


1  M00U«-ATTITU0»-«CT0-HAS&  M0CU4 
I  MC-UA-mU-HAIS  HOCUS 

2  LOOKUP*  AfT-eRCTM-PACTOM  HOtUS 

2  WAtTA-fUtl-TAHti-fUCL-MAM  HOCUS 


(4U1PARTS  AM, 

Si  «{| 

(suttAtn  mij 
UuePAtn  API, 
ISUtPAPTt  Att. 

(suIpaats  au. 

(SUlPAftTS  AM) 


WtV*L  COUNT 
I  I 


LfcVIL  COUNT 


UVt^  COUNT 


I 

l 

3 

4 
$ 
€ 

7 

8 
9 

10 


i  +  r  •  = 

z  4  “  £  * 

=  #  :  - 

u  =  ♦  s 

+  - - +  * - + - -  ~ 


-  ♦ 

-  * 


-  f 


J-'MOCIH-* 
iWtlYC-futt-l 
Itank-  j. 
*  fUCL-  H  At?  I 

♦-  IHPVOYV-*- 


4— INTl  TV - 4 

lUSAUfc-  1 

.irutu-  i. 

MAM  I 

♦  — OMIUfcO— f 


*■-.  PAOCCSS  -t- 
IfUlL-  I 

.IASYMH*  1. , 

i calcs  i 

t  --  IMPLOYS— f 


4—  |LtHtNT--» 

J  AUlL*  I 

.  I  AlYMM-  I  . 

JOATA  I 

♦— DMIVCO— f 


♦  — -6A0UP. - * 

.  .1  ASVHH'WNQS  ! 


PROCfttt— * 
ICHAU-  t 
*.l'OA'  i 

irwo-«iAVY  i 
,..tHPLOY*..f 

t  --PNOCtAS"  i 
I CHtC*'  I 
.  t  «#' 

ittAB-MtAW  | 

♦  .-tHPLOTS-.i 

noth  me 
FOLLOWING  IN 
Tut  DATA  lA%t 


FIG.  7  TYPICAL  EXAMPLES  OF  PSA  REPORTS 


FIG. 8  FUEL  SYSTEM  LAYOUT 


LEVEL  1  ALLIED 

VIEWPOINTS  COMMANO 


LEVEL  1  I  DATA 
VltWPO‘NTS|H  '6H  W  AY 


fuel 

MANAGEMENT 

system 


A  I R  CRAFT 
SYSTEMS 


ENVIRON  MtNT 


FUEL 
HAN  DUNS 


THR  E  ADS 


A  $  n  I 


A  ^  n  m 


FIG.  9  FUEL  MANAGEMENT  SYSTEM 


DECOMPOSITION  HIERARCHY 


SOU  R  C  c  5 


DATA 
H  IG  H  W  AY 


rUtl 

H  A  N  D  LINE 


OUTPUT  5 


GND.  PANEL  VALVE  STATE  DATA 
FLIGHT  TIME  LEFT 
FUEL  FLOW  RATE 
WARN  IN  GS 
FUEL  ASYMM 

TANKS  TRANSFERRING  INDICATIONS 

FUEL  DATA 

LEAK  DATA 

VA  LVES  ST  ATE  DATA 

FUEL  C  OF  G 

ENGINE  FEED  PRESSURE 


A  CTIONS 


control  data 


F.  M.  s 
PROCESSING 


|  INPUTS 

Bus 

refuel  demands 

BUS 

GND  TRAN5FER  DEMANDS 

BUS 

DEFUEL  DEMANDS 

BUS 

FUEL.  C  OF  S 

Bus 

A/C  STATE  DATA 

BUS 

INFLIGHT  FUEL  DEMANDS 

BUS 

DUMP  SELECT 

RAW 

VALVES  POSN,  DATA 

RAW 

SENSOR  DATA 

DESTINATIONS 


data 

HIGH  WAY 


HANDLING 


litMHHlI 

IPTTnTTlI 


PUMP 
C  0  N  TROL 
DATA 


REFUEL 
VA  L  VES 
CONTROL 
DATA 


DUMP 
I  SOLATE 


CON  TROL 


DE  FUEL 
VALVES J 
PUMP 
C  ONTROL 


INFLIGHT 
REFU  t  L 
VALVES 
CONTROL 
D  AT  S, 


COOL  ING 
VA  L V E S 
CONTROL 
DATA 


600  ST 
PUMP 
SPEE  D 
CONTROL 


TRANSFER 

VALVES 

control 


transfer 

pump 

C  ONTROL 


OE  FUEL 
YALVES 
CONTROL 
DATA 


defuel 

PU  M  P 
CONTROL 


F'G.10  MECHANISMS  FOR  GATHERING  I NFOR  MATION :  TABULAR  ENTRIES  AND  DATA 


DECOMPOSITION. 


16-1 


A  PEARL  Sof twareaystem  for  Multi-Processor  Systems 

Dr.  P.  Elzer 
Dr.  H.-J.  Schneider 
Dornier  System  GmbH 
Postfach  1360 
7990  Frledrichshafen 
F  R  G 


Summary 

Most  today  and  all  future  systems  will  be  processor  based.  There  is  a  trend  to  multi¬ 
processor-systems.  This  is  true  for  all  types  of  systems,  not  excluding  airborne  ones. 

Up  to  now  the  majority  of  these  systems  is  programmed  in  assembly  language,  a  very  avkward 
and  expensive  job. 

Seeing  the  difficulties  arising  from  low  level  coding,  Dornier  System  implemented  a  High- 
Order-Language-System  basad  on  PEARL  to  program.  Multi-Processor-Systemi  in  an  airborne  or 
similar  environment.  From  this  environment  certain  condi tionj  for  the  implementation  re¬ 
sulted.  It  was  necessary  to  minimize  the  overhead  produced  by  the  operating  system.  The 
generated  code  was  optimized  to  a  very  high  efficiency  with  respect  to  time  and  memory. 

Originally  the  aim  of  PEARL  was  process-control.  Due  to  tne  application  area  here,  sub¬ 
setting  of  PEARL  was  possible.  This  was  done  with  hiqh  efficiency  of  code  and  a  smaller 
modular  operating  system  in  mind. 

On  the  other  hand  extensions  to  allow  distributed  processing  were  implemented. 

The  system  consists  of 

-  Lai  (Subset  of  BASIC-PFARL) 

-  Conn 

-  Assembler 

-  Linker/Loader 

-  Testing  aids 

-  Special  hardware  for  testing 

It  exists  on  a  host-computer  and  is  written  in  FORTRAN  for  portability.  The  tarqet  pro¬ 
cessors  as  implemented  up  to  now  are  DORNIER  DP  432,  AEG  80-20  and  DORNIER  DP  426,  which 
is  based  on  an  INTEL  8086. 

The  system  was  successfully  used  in  several  applications. 


1 .  Introduction 

It  is  a  well  known  fact  that  High-Order  Lanquaqes  (HOI.'s)  are  one  of  the  most  successful 
means  to  improve  the  productivity  of  programmers  as  well  as  the  quality  of  progiams.  For 
several  years,  however,  there  was  a  heated  discussion  amonq  experts  as  to  whether  or  not 
this  was  also  true  for  real  time  and  other  time-critical  applications,  like  e.g.  avionj.cs 
or  guidance  and  control  applications.  But  mostly  this  discussion  was  not  very  well  support¬ 
ed  by  quantitative  data,  and  it  was  therefore  felt  necessary  to  conduct  a  study  (1)  on 
the  applicability  of  Hiqh-Order  Languages  to  guidance  and  control.  The  task  was  also,  to 
find  out,  which  special  aspects  had  to  be  taken  into  consideration  in  this  -  admittedly 
difficult  -  application  area.  The  study  concentrated  on  the  Lanquage  PEARL  (=  Process  and 
Experiment  Realtime  Automation  Language),  because  it  was  the  most  promising  candidate 
language  in  the  defenr-e  environment. 

The  results  were  very  encourageinq.  It  turned  out  that  all  of  the  relevant  problems  could 
be  formulated  in  the  language.  It  was  not  even  necessary  to  exploit  its  full  descriptive 
power.  There  was  one  exception,  however:  PEARL  did  not  contain  yet  all  the  elements 
necessary  for  the  programming  of  distributed  systems  and  had  therefore  to  be  slightly 
expanded  for  this  purpose. 

Another  important  result  was  that  the  efficiency  of  the  compiler  and  the  size  of  the 
underlying  operating  system  were  of  crucial  importance  for  the  usefulness  of  a  HOI,  in 
guidance  and  control  applications.  The  reasons  for  this  are  that,  in  this  claS3  of 
applications  memory ,  however  cheap,  still  is  subject  to  severe  limitations  like  physical siz 
energy  consumption,  or  welqht.  Dynamic  efficiency  of  the  programs  is  of  importance,  too , 
because  guidance  and  control  processes  tend  to  be  extremely  time-critical. 

It  also  turned  out  that  translators  forHOL's  in  quidance  and  control  had  to  provide  very 
elaborate  test  and  integration  aids  because  of  the  intrinsic  difficulties  in  testing  and 
integrating  embedded  computer  systems. 

It  was  therefore  decided  that  Dornier  System  should  develop  a  PEARL  translation  system 
under  contract  with  the  German  MOD  (!MVg)  which  fulfilled  the  following  requirements: 

-  Extreme  Efficiency  of  the  compiled  code 

-  Elimination  of  Operating  System  Overhead  as  far  as  possible 

-  Possibility  to  program  distributed  systems 

-  Possibility  to  separate  code-elements  in  RAM  from  those  in  TROM-type  memory 


16  2 


Optional  support  for  system  integration 

-  Adaptability  to  various  target  processors 

-  Easy  transportability  between  host-processors 

It  was  also  obvious  that  it  would  not  be  sufficient  to  just  develop  a  compiler.  It  was 
rathor  necessary  to  develop  an  entire  PEARL  translation  system  for  distributed  systems 
which  consisted  of  the  following  components: 

-  Compiler-generator 

-  Compiler  front-end 

-  Code  generator 

-  Assembler 

-  Library  management 

-  Modular  operating  system 

-  Linking  loader 

-  Test  and  Integration  aids 

The  construction  principles  of  that  system, and  details  about  its  implementation  have 
already  been  published  several  times  (3,  4,  5,  6). 


2.  The  Language  PEARL 

The  development  and  the  properties  of  PEARL  have  also  already  been  rather  broadly  publish¬ 
ed,  e.g.  in  (7,  8).  For  the  purposes  of  this  paper  it  '.3  therefore  sufficient  to  concen¬ 
trate  on  a  few  highlights. 

2.1  Development  and  support  of  PEARL 

PEARL  was  developed  in  the  early  seventies  by  a  group  of  computer  manufacturers,  software 
houses  and  research  institutes  in  the  FRG.  The  development  was  organized  by  the  University 
of  Erlangen  and  mainly  sponsored  by  the  German  Ministry  of  Research  and  Technology  (BMFT) 
The  first  experimental  compilers  were  finished  in  i 975  and  full  scale  industrial  appli¬ 
cations  started  in  1977.  Today,  more  than  200  PEARL-applications  are  in  operation  through¬ 
out  the  FRG  in  a  broad  variety  of  technological  areas  including  defense  systems. 

Uniformity  and  continuity  of  PEARL  are  ensured  by  DIN  standards.  The  draft  standard 
DIN  66253,  part  1,  'Basic  PEARL',  has  been  available  since  Part  2,  'Full  PEARL', 

followed  in  August  1980.  Besides,  PEARL  has  been  submitted  to  ISO  for  international 
standardization. 

The  support  organization  for  PEARL  is  the  ' PEARL- Association'  with  offices  at  the  follow¬ 
ing  addresses: 

-  PEARL  Association 

Graf-Rocke-Strasse  84 
Fcstfach  1139 
D-4000  DUsseldorf  1 
F.R.G. 


PEARL  Association 

c/o  Institut  fuer  Regelungstechnik  und  Prozessautomatisierung 

Technical  University  of  Stuttgart 

Seidenstrasse  3S 

0-7000  Stuttgart  1 

F.R.G. 


2 . 2  Features  of  PEARL 

PEARL  has  been  developed  for  the  application  engineer.  Great  emphasis  has  therefore  been 
laid  upon  language  elements  which  facilitate  the  design  of  application  programs  in  a 
real-time  and  process-control  environment.  The  most  important  language  elements  belong  to 
the  following  groups: 

2.2.1  Real-time  Language  Elements: 

To  the  knowledge  of  the  authors  PEARL  contains  currently  the  most  complete  set  of  elements 
for  description  and  control  of  parallel  processes.  It  is  possible  to  declare  program  com- 
ponenrs  as  'tasks'  and  Initiate  and  control  their  execution  as  parallel  processes,  to  react 
on  interrupts  and  exceptions,  and  to  connect  these  actions  to  external  time  conditions. 

E.g.  it  is  possible  to  describe  complex  scheduling  conditions  like  the  following  on 
language  level: 

AFTER  5  SEC  ALL  7  SEC  DURING  106  MIN 
ACTIVATE  MEASUREMENT  PRIORITY  5; 

This  means  that  five  seconds  after  the  execution  of  this  statement  the  computing  process 
'MEASUREMENT'  is  activated  with  priority  five  every  seven  seconds  for  a  total  period  of 
onehundredandsix  minutes. 

2.2.2  Description  of  the  Hardware  Configuration 

In  the  '  system-c.ivlsio.i '  of  PEARL-progrruns  the  hardware  configuration,  especially  the 
process  peripherals  and  the  data-paths,  c^n  be  described  separ  tely  from  the  application 
algorithmus  proper  the  'problem  division'.  The  relevant  terminal  points  for  I/O  operations 


16-3 


can  be  named  by  symbolic  identifiers  and  thus  be  referred  to  in  the  'problem  division' 
independently  from  the  actual  hardware.  This  capability  greatly  enhances  documentation 
value  and  portability  of  PEARL  programs. 

2.2.3  Input/Output  Language  Elements 

PEARL  contains  a  consistent  general  I/O  model  for  nonstandard  devices  as  well  as  a  set  of 
user  oriented  I/O  statements  for  the  most  usual  operations.  The  general  1/0  model  is 
based  on  the  observation  that  each  data-path  in  a  digital  system  can  be  described  by  a 
sequence  of  'data-stations'  ('dations')  and  'interfaces'.  A  data-station  can  either  be  a 
source  of  data,  a  sink, or  intermediate  storage.  It  further  has  so-called  'channels'  which 
can  be  of  the  following  types:  'data',  'control',  'signal'  and  'interrupt'.  The  'inter¬ 
faces'  are  in  principle  sets  of  conversion  routines  which  map  the  output  characteristics 
of  one  dation  onto  the  input  characteristics  of  the  following  one. 

The  user-oriented  I/O  statements  are  the  following  ones: 

-  GET/P'JT  for  character  transfer 

-  READ/WRITE  for  file  handling 

-  TAKE/SEND  for  process  peripherals 

All  necessary  format  and  control  elements  are  provided. 

2.2.4  Algorithmic  Language  Elements 

Number  and  descriptive  power  of  the'  language  elements  for  the  formulation  of  algorithms 
and  procedures  correspond  to  the  state-of-the-art  of  modern  programming  languages.  The 
concept  of  data  types  in  Full-PEARL  enables  the  user  to  define  problem  oriented,  composite 
data  types  and  new  operators.  These  abstract  data  types  permit  a  great  number  of  compile¬ 
time  checks  and  contribute  to  a  refined  modular  structure. 

2.2.5  M'jGular  Program  Structure 

Last,  but  not  least,  PEARL  supports  modular  program  design  and  separate  compilation  cf 
program  components.  A  PEARL  program  is  composed  of  separately  compilable  modules  with 
exactly  defined  interfaces.  This  structure  also  greatly  facilitates  communication  between 
the  members  of  a  project  team  and  supports  the  modular  composition  of  complex  program 
systems. 


3.  The  PEARL- Implementation  by  Dornier  System 

As  already  mentioned  above,  the  characteristics  of  the  PEARL- implementation  by  Dornier 
System  are  mainly  dictated  by  the  requirements  of  its  application  area.  They  are  most 
obviously  reflected  In  the  choice  of  the  implemented  language  subset. 

3 . 1  The  Language  Subset 

For  the  reasons  mentioned  above,  those  language  elements  were  not  implemented  from  which 
it  was  known  that  they  would  result  in  poor  object  code  efficiency  or  unnecessary  overhead 
at  runtime. 

In  particular  such  elements  are: 

-  Filehandling  (on-board  computers  usually  are  not  equipped  with  magnetic  background 
storage  devices) 

-  Formatting  (on  board  there  are  practically  no  printing  devices  and  the  few  which  there 
are,  can  easily  be  handled  by  stream  output  of  character  strinqs) 

-  Absolute  time  (time  is  usually  counted  relative  to  'mission  start') 

-  Signals  (exception  handling  is  a  source  of  huge  overhead  and  it  is  mandatory  that  un¬ 
planned  software  conditions  do  not  occur  during  the  operational  phase  of  a  system) 

-  Structures  (Application  studies  showed  that  measurement  data  are  usually  of  homogeneous 
type) . 

On  the  other  hand  certain  extensions  had  to  be  provided  for  the  programming  of  distributed 
systems.  However,  it  was  a  strict  policy  to  keep  them  very  small  in  order  not  to  deviate 
too  much  from  the  original  PEARL.  Another  important  design  criterium  for  these  multi- 
conputer  extensions  was  that  they  had  to  be  'strategy  independent’,  i.e.  the  user  should 
be  enabled  to  implement  whatever  concept  be  deemed  optimal  for  the  safety  -  or  redundancy- 
strategy  of  his  application.  These  considerations  resulted  in  the  followinn  extensions: 

-  Declaration  of  entities  with  the  attribute  'NET  GLOBAL'  of  types  'variable',  'semaphore' 
and  'task'.  These  entities  are  theri  either  copied  into  or  made  known  to  every  processor 
in  the  distributed  system. 

-  Operations  on  such  entities.  This  was  achieved  without  additional  statements  or  operators, 
just  by  extending  the  semantics  of  existing  operations  (overloading) . 

Besides, there  Is  a  facility  for  the  connection  to  'external'  tasks  or  procedures,  which 
may  e.g.  be  written  in  Assembler.  Last,  but  not  least,  runtime  checks  can  be  inserted  on 
a  statement-by-statement.  basis  by  means  of  'check/nocheck'  statements. 

3 . 1  The  Compiler  Front-End  and  Its  Technology 

The  technology , which  had  to  be  used  for  the  translator,  was  determined  by  the  requirements 
of  adaptability  to  various  target  processors  and  easy  transportability  with  respect  to  the 
host  processor.  ThiG  led  to  the  usual  separation  into  a  'front-end'  which  is  independent 
of  the  target  machine  and  translates  PEARL  into  machine-independent  intermediate  code. 


364 


The  compiler  front-end  iu  written  in  FORTRAN  for  the  following  reasons t 

-  FORTRAN  translators  are  available  for  nearly  every  possible  host  computer 

-  A  compiler,  written  in  FORTRAN,  is  much  more  readable  and  much  easier  to  maintain 
than  any  other  one  which  is  constructed  according  to  an  elaborate  bootstrapping 
technology . 

It  turned  out  that  this  decision  was  the  right  one.  The  front-end  could  be  adapted  to  the 
following  host-computers  with  an  effort  of  a  few  man-days  each: 

DEC  PDP-11/70  and  11/44 
AEG-Telef unken  80-20/4 
Siemens  7760 
DEC  PDP  10 

Fig.  1  shows  an  overview  over  the  structure  of  the  entire  translation  system. 

The  intermediate  representation  had  to  be  chosen  according  to  the  requirement  ot  maximum 
code  efficiency.  Therefore  it  was  not  possible  to  use  one  of  the  usual  virtual  machine 
representations,  because  these  usually  do  not  contain  any  more  all  the  information  which 
was  there  fa  the  source  program  and  which  is  necessary  for  optimization.  Besides,  modern 
target  processors  usually  have  a  more  powerful  instruction  set  than  the  one  which  happens 
to  be  implemented  in  a  particular  virtual  machine  architecture.  This,  too,  leads  to  code¬ 
inefficiencies. 

Therefore  it  was  decided  to  use  a  completely  target-independent  intermediate  representation, 
the  so-called  ’triple-code’.  In  principle  it  is  a  numeric  representation  of  the  program, 
"here  the  individual  operation  is  of  the  form: 
operator,  operand  1,  operand  2 

To  sum  up:  the  compiler  front-end  is  written  in  FORTRAN  and  translates  PEARL-Source  pro¬ 
grams  into  triple-code,  it  can  detect  approximately  200  different  syntactical  and 
semantical  errors  and  identifies  them  by  statement  number,  name  of  object  and  additional 
information,  if  necessary. 

During  translation  the  following  listings  c  n  be  produced  on  request: 

-  Source  listing 

-  Cross-Reference  listings  for  the  fo.'  'owing  objects  with  their  respective  attributes 
(e.g.  ’GLOBAL’) 

.  Variables 
.  Tasks 
.  Semaphores 
.  Procedures 
.  Labels 
.  Datlons 

-  Hierarchies  of  procedure  calls 

-  Process  hierarchy 

-  Synchronization  structure 

-  Location  of  variables 


1.3  The  Code-generator 

It  produces  symbolic  assembly  cede  wi  n  relative  addresses  for  the  target  processor  in 
question.  This  second  intermedia'  la  »r  has  the  disadvantage  of  an  additional  trans¬ 
lation  step,  which  my  cost  some  time  durii  3  translation,  but  this  is  more  than  balanced 
by  the  advantages  So,  e.g.  the  assembler-! isting  provides  an  excellent  means  for  final 
compiler  testing  and  for  easy  linkage  of  external  routines. 

At  the  moment  code-generators  ex.»-  ft  tie  following  target  processors: 

-  DORN I E R-MUDAS  DP  432/  133 

-  AEG-Telef 'nken  80-20 

-  DORN  I E  R-  W’  DAS  DP  426  TNTE  8>  16-based) 


3.4  Assembler 

This  component  fs  necessary  for  he  reasons  given  above.  It  is  fully  integrated  into  the 
translator  vstem  but  usua  i>  act >d  from  the  support  software  provided  by  the  vendor 
of  the  targt  processor. 


3.5  Pre-Lii  ix..t  ^ 

In  case  the  link!  ng-loadei  ,  ■  r.  rh  is  provided  by  the  vendor  of  the  target  processor, is  not 
capable  of  handing  the  mu  1 ' i- module  structure  of  PEARL-Programs ,  a  pre-linker  is  provided, 
which  performs  the  follow. ng  f motions- 

-  Identification  of  program  module'  to  be  linked  together 

-  Distribution  of  code  into  RAF  <  r  ROM 

-  Distribution  of  program  modules  over  the  various  processors  of  the  distributed  system 

-  Completeness  check  for  the  definition  of  global  entities 

-  Linkeage  of  the  operating  system  components  required  by  the  proqram 

-  Sorting  of  task-control -blocks  and  code  segments 

-  Output  of  the  control  sequence  for  the  linking  loader 


16-5 


3.6  Linking-Loader 

This  tool  performs  the  linkeage  process  proper  and  produces  absolute  code.  In  case  it 
cannot  be  taken  from  the  vendor's  software  it  is  delivered  together  with  the  PEARL-System 
and  is  functionally  integrated  into  the  pre-linker. 


3.7  Modular  Operating  System 

This  is  a  unique  feature  of  the  DORNIER  PEARL-System.  It  allows  efficient  use  of  PEARL 
even  in  the  smallest  target  configurations.  This  is  achlvied  by  abandoning  the  concept  of 
an  underlying,  more  or  less  autonomous  and  "monolitic"  operating  system.  It  is  replaced 
by  a  set  of  routines  which  are  automatically  linked  to  the  application  program  according 
to  its  requirements.  These  routines  operate  on  task-control-blocks,  time-order-blocks, 
etc.  which  are  provided  by  the  compiler.  Thus  it  was  possible  to  reduce  the  size  of  the 
operating  system  kernel  to  a  mere  300  to  500  16-bit  words,  depending  on  the  quality  of  the 
instruction  set  of  the  target  processor.  This  kernel  includes  the  following  functions: 

-  Initialization 

-  Dispatcher 

-  An  exit  routine,  which  is  executed  if  the  system  knows  that  there  will  be  no  task 
switching 

The  following  functional  modules  can  then  be  added  automatically  according  to  the  require¬ 
ments  of  the  application  program: 

-  Clock-routines 

-  Interrupt  handler 

-  Activation  of  tasks 

-  Task-termination  (regular) 

-  Task-termination  (irregular;  by  'TERMINATE') 

-  Suspension  of  tasks 

-  Continuation  of  suspended  tasks 

-  Deletion  of  a  schedule  ('PREVENT') 

-  Inter-processor  communication 

-  User  command  interface 

-  Character  I/O  ('GET',  'PUT') 

-  Procedure  entry/exit 

-  Array  indexing 

-  Arithmetic  routines  for  FLOAT  and  DURATION  types 

-  Comparison  routines  for  FLOAT  and  DURATION  types 

-  Type  conversion  routines 

-  Standard  functions  (ABS ,  SIGN) 

-  Handling  of  runtime  errors 

If  all  operaring  system  services  are  invoked,  it  uses  up  to  4  to  6  K  of  16-bit  words, 
depending  on  the  architecture  of  the  target  processor. 


3 . 8  Library  management 

In  order  to  be  able  to  fully  exploit  the  possibilities  of  the  modular  structure  of  PEARL 
programs  and  to  enable  the  user  to  expand  his  system-library  by  himself,  a  special  library 
management  package  is  provided. 

It  contains  the  following  functions: 

-  Inclusion  of  a  new  module 

-  Deletion  of  a  module 

-  Listing  of  the  Directory 

-  Modification  of  module  names 


3.9  Test  and  Integration  Aids 

Firstly,  these  include  all  the  above  mentioned  listings  which  ire  produced  by  the  compiler 
and  serve  as  reference-documents  for  the  user  during  test  and  integration. 

Additionally  there  are  runtime  checks,  which  are  on  request  inserted  into  the  program 
either  by  the  compiler  or  as  operating  system  routines.  The  following  errors  can  be 
monitored: 

-  Array  index  overflow 

-  Division  by  zero 

-  Range  violation 

-  Conversion  errors 

These  runtime  checks  can  be  enabled  or  disabled  by  the  'check/nocheck'  feature. 

Furthermore,  several  trace-routines  can  be  built  into  the  code: 

-  Jump  trace 

-  Subroutine  trace 

-  Call  trace 

-  Task  trace 

Another  important  component  is  the  debugger,  which  can  be  loaded  together  with  the  object 
program.  It  supports  the  following  test  functions: 

-  Activation  and  continuation  of  tasks 

-  Set  and  reset  of  breakpoints 


16-6 


Output  of  environment  information  at  breakpoints 

-  Input  and  display  of  values  of  variables 

-  Exit  from  Debugger  (and  return  to  normal  execution  of  the  program) 

The  design  of  this  debugger  allows  for  two  modes  of  operation: 

-  Debugging  on  assembler  level 

-  Debugging  on  source  level 

The  first  mode  has  already  been  implemented,  the  second  one  is  being  designed. 


4.  Application  of  the  System 

This  PEARL  Translator  system  has  already  been  successfully  used  in  several  applications. 
Two  of  them  are  completed: 

-  A  training  simulator  for  the  anti-aircraft  tank  'Roland'  (with  6  physically 
distributed  processors) 

-  A  gust  alleviation  system  for  a  light  aircraft 

In  both  projecr  PEARL  proved  highly  successful  and  the  translator  system  fulfilled  the 
expectations. 


5.  References 

1/  H.-J.  Schneider: 

Modulare  Software  fiir  Flugfuehruna  (Modular  Software  for  Guidance  and  Control) 
Dornier  System,  Report,  June  1978 

2/  DIN  66253,  Part  1,  preliminary  standard 
Programmiersprache  PEARL,  Basic  PEARL 
Beuth  Verlag  GmbH,  Berlin,  Koeln,  1981 

3/  H.-J.  Schneider 

PEARL-Softwaresystem  fUr  gekoppelte  Klein-  und  Mikrorechner  (PEARL-Software  System 
for  distributed  Mini-  and  Microcomputers) ; 

PEARL- Rundschau,  Vol.  1,  No  4 ,  Dec.  1980  (pp  3-5) 

4/  M.  Ammann 

PEARL  iUr  verteilte  System  (PEARL  for  distributed  Systems) , 

Informatik-Fachberlchte  39,  1981,  Springer  Ver lag  (pp  399-403) 

5/  F.  Graf 

PEARL  fiir  Mlkrocomputer  (PEARL  for  microcomputers) , 

Informatik-Fachberlchte  39,  1981,  Springer  Verlaq  (pp  413-421) 

6/  M.  Ammann,  P.  Elzer 

Das  PEARL-Uebersetzungssystern  von  Dornier  System,  Friedrichshafen 
(The  PEARL-Translator  system  by  Dornier  Systems,  Friedrichshafen) 

PEARL- Rundschau,  Vol.  2,  No  2,  March  1981 

7/  PEARL  Subset  for  Avionic  Applications;  Agard  Advisory  Report  No  90,  Annex  J, 

(A  Study  of  Standardization  Methods  for  Digital  Guidance  and  Control  Systems) , 

May  1977 

8/  T.  Martin 

PEARL  at  the  Age  of  three;  Proceedings  of  4th  IEEE  Software  Engineering  Conference, 
Sept.  1979  (pp  106-109) 


PEARL  - 
sourceprogram 


f  In 

I:  MODULE  i:  :  J 


l 


N  - > 

listing 

COMPILER 

(arrors) 

Syntax,  errors, 
description  of 
targat  machine 


system 


PEARL 

FORTRAN 

ASSEMBLER 


jos-  i 

1  routines  j 

- - . 

!  built*  In-  j 

1  functions  j 

;  Debugger  -1 

1 - - 

1 - 

--v:'  1 

Fig  1  STRUCT  UHE  OF  THE  SYSTEM 


17-1 


DISTRIBUTED  AND  DECENTRALIZED  CONTROL 
IN 

FULLY  DISTRIBUTED  PROCESSING  SYSTEMS 

Philip  H.  Enslow  Jr. 

Georgia  Institute  of  Technology 
School  of  Information  and  Computer  Science 
Atlanta,  Georgia  30332 

am*m»r 

Certainly  one  of  the  Boat  iaportant  factors  In  designing  and  implementing  fully  distributed  processing 
systems  (FDPS)  la  the  Issue  of  distributed  and  decentralized  control.  Extremely  loose  coupling,  both 
physical  and  logical,  is  an  essential  characteristic  of  an  FDPS.  This  mode  of  organization  and  operation 
is  quite  different  from  the  control  of  centralized  systems.  The  first  step  In  the  development  of 
distributed  and  decentralized  control  has  been  the  examination  of  various  models  of  control  that  may 
provide  these  features  and  the  operational  characteristics  of  those  models. 


1  FULLY  DISTRIBUTED  CONTROL 

1.1  What  la  j g  Fully  Distributed  Pmnaaaing  System? 

It  has  been  determined  that  a  high  degree  of  both  distribution  and  decentralization  of  control  is 
essential  if  a  system  is  to  deliver  a  major  proportion  of  those  benefits  being  claimed  for  "distributed 
systems.”  Not  only  must  the  control  be  distributed,  but  the  hardware  and  data  must  also  exhibit  similar 
characteristics.  When  all  three  system  components,  l.e.,  control,  hardware,  and  data  are  sufficiently 
distributed,  then  tne  system  can  be  characterized  as  "Fully  Distributed."  (See  other  paper  in  these 
proceedings  [Ensl8l]  for  a  complete  discussion  of  FDPS's.)  This  paper  will  focus  on  the  control  aspects 
of  FDPS's. 

1.2  Implications  of  J(£ul  FDPS  Definition  on  Control 


1.2.1  General  Nature  of  FDPS  Executive  Control 

Several  of  the  characteristics  of  an  FDPS  are  found  to  directly  impact  the  design  and  implementa¬ 
tion  of  the  executive  control  for  such  a  system.  These  include  system  transparency  to  the  user, 
extremely  loose  physical  and  logical  coupling,  and  cooperative  autonomy  r.s  the  basic  mode  of  component 
interaction.  System  transparency  means  that  the  FDPS  appears  to  a  user  as  a  large  uniprocessor  which  has 
available  a  variety  of  services.  It  must  be  possible  for  the  user  to  obtain  these  services  by  naming 
them  without  specifying  any  information  concerning  the  details  of  their  physical  location.  The  result  is 
that  systen  control  is  left  with  the  task  of  looatlng  all  appropriate  instances  (copies)  of  a  particular 
resource  and  choosing  the  instance  to  be  utilized, 

"Cooperative  autonomy"  is  another  characteristic  of  an  FDPS  heavily  impacting  its  executive 
control.  The  "lower-level*  control  functions  of  both  the  logical  and  physical  resource  components  of  an 
FDPS  are  designed  to  operate  in  a  "cooperatively  autonomous"  fashion.  Thus,  an  executive  control  must  be 
designed  such  that  any  resource  is  able  to  refuse  a  request  even  though  it  may  have  physically  accepted 
the  message  containing  that  request.  Degeneration  into  total  anarchy  is  prevented  by  the  establishment 
of  a  common  set  of  criteria  to  be  followed  by  all  resources  in  determining  whether  a  request  is  accepted 
and  serviced  as  originally  presented,  accepted  only  after  bidding/negotiation,  or  rejected. 

Another  important  FDPS  characteristic  that  definitely  affects  the  design  of  its  executive  control 
is  the  extremely  loose  coupling  of  both  physical  and  logical  resources.  The  components  of  an  FDPS  are 
connected  by  communication  paths  of  relatively  low  bandwidth.  The  direct  sharing  of  primary  memory 
between  processors  is  not  acceptable.  Even  though  the  logical  coupling  could  still  be  loose  with  this 
physical  interconnection  mechanism,  the  presence  of  a  single  critioal  hardware  element,  the  shared  memory 
would  create  fault-toleranoe  limitations.  All  communication  takes  place  over  "standard"  input/output 
paths.  The  actual  data  rates  that  can  be  supported  are  prirarily  a  function  of  the  distance  between 
processors  and  the  design  of  their  input/output  paths.  In  any  event,  the  transfer  rates  possible  will 
probably  be  muoh  less  than  memory  transfer  rates.  This  implies  that  the  sharing  of  Information  among 
components  on  different  processors  is  greatly  curtailed,  and  system  control  is  forced  to  work  with 
information  that  is  usually  out-of-date  and,  as  a  result,  inaccurate. 

The  control  of  an  FDPS  requires  the  action  and  cooperation  of  components  at  all  layers  of  the 
system.  This  means  that  there  are  elements  of  PDFS  control  present  in  the  lowest  levels  of  the  hardware 
as  well  as  software  components.  This  paper  is  primarily  interested  in  the  software  components  of  the 
FDPS  control  which  are  typically  referred  te  as  the  "executive  control." 

The  executive  control  i3  responsible  for  managing  the  physical  and  logical  resources  of  a  system. 
It  accepts  user  requests  and  obtains  and  schedules  the  resources  necessary  to  satisfy  a  user's  needs.  As 
mentioned  earlier,  these  tasks  are  accomplished  so  as  to  unify  the  distributed  components  of  the  system 
into  a  whole  and  provide  system  transparency  to  the  user. 

1.2.2  Why  Not  Centralised  Control? 

Why  then  is  a  centralized  method  of  control  not  appropriate?  In  systems  utilizing  a  centralized 
executive  control,  all  of  the  control  processes  share  a  single  coherent  and  deterministic  view  of  the 
entire  system  state.  An  FDPS,  though,  contains  only  loosely-coupled  components,  and  the  communication 
among  these  components  is  restricted  and  subject  to  variable  time  delays.  This  means  that  one  cannot 
guarantee  that  all  processes  will  have  the  same  view  of  the  system  state  [Jens78],  In  fact,  it  is  an 
important  characteristic  of  an  FDPS  that  they  will  not  have  a  consistent  view. 


17-: 


A  centralized  executive  oontrol  weakens  the  fault-toleranoe  of  the  overall  system  due  to  the 
existence  of  a  single  critical  element,  the  executive  oontrol  itself.  This  obstacle,  though,  is  not 
insurmountable  for  strategies  do  exist  for  providing  frult-toleranoe  in  centralised  applications. 
Garcia-Molina  [Garc79],  for  example,  has  desoribed  a  scheme  for  providing  fault-toleranoe  in  a 
distributed  data  base  management  system  with  a  centralized  oontrol.  Approaches  of  this  type  typically 
assume  that  failures  are  extremely  rare  events  and  that  the  system  can  tolerate  the  dedication  of  a 
relatively  long  interval  of  time  to  reconfiguration.  These  restrictions  are  usually  unacceptable  in  an 
FDPS  environment  where  it  is  important  to  provide  fault-tolerance  with  a  minimum  of  disruption  to  the 
servloes  being  supported. 

Also,  the  extremely  important  issue  of  overall  system  performance  must  be  considered.  A 
distributed  processing  system  is  expected  to  utilize  a  large  quantity  and  a  wide  variety  of  resources. 
If  a  completely  centralized  executive  oontrol  is  implemented,  there  is  a  high  probability  that  a 
bottleneck  will  be  created  in  the  node  executing  the  control  functions .  A  distributed  and  decentralized 
approach  to  control  attempts  to  remove  this  bottleneck  by  dispersing  the  control  decisions  among  multiple 
components  on  different  nodes. 

1.2.3  Distributed  vs.  Decentralized 

This  paper  advocates  utilizing  an  approach  for  the  oontrol  of  an  FDPS  that  is  both  distributed  and 
decentralized.  There  is  a  clear  distinction  between  the  terms  "distributed"  and  "decentralized"  as  they 
are  used  in  the  context  of  this  project.  "Distributed  control"  is  characterized  by  having  its  executing 
components  physically  located  an  different  nodes.  This  means  there  are  multiple  lfipj.  OL  control 
activity.  In  "decentralized  control."  on  the  other  hand,  oontrol  decisions  are  made  independently  Jut 
separate  components  at  different  locations.  In  other  words,  there  are  multiple  loci  ai  control  decision 
moii-inf  Thun,  distributed  and  decentralized  control  has  active  components  located  on  different  nodes  and 
those  components  are  capable  of  making  independent  control  decisions. 


2.  ISSUES  IK  DISTRIBUTED  CQMT.HQL 

Before  examining  specific  aspects  of  executive  control  in  an  FDPS,  a  look  at  some  of  the  various 
issues  of  distributed  control  is  appropriate.  There  are  three  primary  issues  that  require  examination: 
1)  the  effect  of  the  dynamics  of  FDPS  operation  on  an  executive  control,  2)  the  nature  of  the  information 
an  executive  control  must  maintain,  and  3)  the  principles  to  be  utilized  in  the  design  of  an  executive 
control. 

2.1  nvnewl ns 

Dynamics  are  an  inherent  characteristic  of  the  operation  of  an  FDPS.  They  are  found  in  the  work 
load  presented  to  the  system,  the  availability  of  resources,  and  the  individual  work  requests  submitted. 
The  dynamic  nature  of  each  of  these  provides  the  FDPS  executive  control  with  many  unique  problems, 

2.1.1  Workload  Presented  to  the  System 

In  an  FDPS,  work  requests  can  be  generated  either  by  users  or  active  processes  and  can  originate  at 
any  node.  Such  work  requests  potentially  can  require  the  use  of  resources  on  any  processor.  Thus,  the 
collection  of  executive  control  procedures  oust  be  able  to  respond  to  requests  arriving  at  a  variety  of 
locations  from  a  variety  of  sources.  Each  request  may  require  system  resources  located  on  one  or  more 
nodes,  not  necessarily  including  the  originating  node.  Or.e  of  the  goals  of  an  FDPS  executive  oontrol  is 
to  respond  to  theso  requests  in  a  manner  suoh  that  the  load  on  the  entire  system  is  balanced. 

2.1.2  Availability  of  Resources 

Another  dynamic  aspect  of  the  FDPS  environment  concerns  the  availability  of  resources  within  the 
system.  As  mentioned  above,  a  request  for  a  service  to  be  provided  by  a  system  resource  may  originate  at 
any  location  in  the  system.  In  addition,  there  may  be  multiple  copies  of  a  resource  or  possibly  multiple 
resources  that  provide  the  3aoe  functionality  (e.g.,  there  may  be  functionally  equivalent  FORTRAN  com¬ 
pilers  available  on  several  different  nodes).  Since  resources  are  not  immune  to  failures,  the  pos¬ 
sibility  of  losing  existing  resources  or  gaining  both  new  and  old  resources  exists.  Therefore,  an  t’DPS 
executive  control  must  be  able  to  manage  system  resources  in  a  dynamic  environment  in  which  the 
availability  of  a  resouroe  is  unpredictable. 

2.1.3  Individual  Work  Requests 

Finally,  the  dynamic  nature  of  the  individual  work  requests  must  be  considered.  As  mentioned 
above,  these  work  requests  define,  either  direotly  or  indirectly,  a  set  of  cooperating  processes  which 
are  to  be  Invoked.  An  indirect  definition  of  the  work  to  be  done  occurs  when  the  work  request  is  Itself 
the  name  of  a  command  file  or  contains  ...e  name  of  a  command  file  in  addition  to  names  of  executable 
files  or  directly  executable  statements.  A  command  file  contains  a  collection  of  work  requests 
formulated  in  command  language  statements  (see  Figure  1  for  a  description  of  the  syntax  for  a  suitable 
command  language)  that  are  interpreted  and  executed  when  the  command  file  is  invoked.  The  concept  of  a 
command  file  is  similar  to  that  of  a  procedure  file  whioh  is  available  on  several  current  systems. 

Management  of  the  processes  for  a  work  request  thus  includes  the  possibility  that  one  or  more  of 
the  processes  are  command  files  requiring  command  interpretation.  The  presence  of  command  files  will 
also  result  in  the  inclusion  of  additional  information  in  the  task  graph  or  possibly  additional  task 
graphs. 

An  important  objective  of  work  request  management  is  to  control  the  set  of  processes  and  do  so  in 
such  a  manner  that  the  Inherent  parallelism  present  in  the  operations  to  be  performed  is  exploited  to  the 
maximum.  In  addition,  situations  in  which  one  or  more  of  the  processes  fail  must  also  be  handled. 


17-3 


2.2  Information 

All  types  of  exeoutive  control  systens  require  Information  In  order  to  function  and  perform  their 
mission.  The  characteristics  of  the  Information  available  to  the  exeoutive  oontrol  Is  one  aspect  of 
fully  distributed  systems  thr.t  result  in  the  somewhat  unique  oontrol  problems  that  follow: 

1 .  Because  of  the  nature  of  the  interconnection  links  and  the  delays  inherent  In  any  com¬ 
munication  process,  system  information  on  hand  is  always  out  of  data- 

2.  Beoause  of  the  autonomous  nature  of  operation  of  all  components,  each  prooessor  can  make 
"its  own  decision"  as  how  to  reply  to  an  inquiry;  therefore,  there  is  always  the 
possibility  that  information  received  is  incomplete  and/or  Inaccurate. 

3.  Because  of  the  inherent  tine  delays  experienced  in  exchanging  Information  nmong  processes 
on  different  nodes,  some  information  held  by  two  processes  may  conflict  during  a 
particular  time  interval. 


2.3  Design  Principles 

Designing  the  system  control  functions  required  for  the  extremely  loosely-coupled  environment  of  an 
FDPS  and  implementing  those  functions  to  operate  in  that  environment  will  certainly  require  the  applica¬ 
tion  of  some  new  design  principles  in  addition  to  those  commonly  utilized  In  operating  systems  for 
centralized  systems.  These  design  principles  must  address  at  least  the  two  distinguishing  charac¬ 
teristics  of  FDPS* s : 

-  System  Information  available,  and 

-  Nature  of  resource  control 


2.3.1  System  Information 

The  various  functions  of  an  FDPS  exeoutive  oontrol  must  be  designed  recognizing  that  system 
information  is: 

-  "Expensive"  to  obtain 

-  Never  fully  up-to-date 

-  Usually  incomplete 

-  Often  inaccurate 


All  of  these  characteristics  of  system  information  result  from  the  fact  that  the  components  provid¬ 
ing  the  Information  are  interconnected  by  relatively  narrow  bandwidth  communication  paths  and  that  those 
components  are  operating  somewhat  autonomously  with  the  possibility  that  their  state  may  change 
immediately  after  a  status  report  has  been  tansmittod.  Further,  it  is  important  to  note  that  the  mere 
existence  (or  disappearance)  of  a  resource  is  not  of  interest  to  a  specific  component  of  the  FDPS 
executive  control  until  that  component  needs  that  information. 

The  design  principles  applying  to  system  information  that  have  been  identified  thus  far  include  the 
following: 

1 .  Economy  cn—nni  q»t1n,r  ask  for  only  the  information  required. 

2-  Resiliency:  be  prepared  to  recover  and  continue  in  the  absence  of  replies. 

3.  Flexibility:  be  prepared  to  recover  and  continue  if  the  information  provided  proves  to  be 
inaccurate  when  it  is  utilized. 


2.3.2  Resour oe  Control 

Since  all  of  the  resources  are  operating  under  local  oontrol  under  the  policies  of  cooperative 
autonomy,  all  requests  for  service,  or  the  utilization  of  any  resource  such  as  a  file,  must  be  effected 
through  negotiations  that  culminate  in  positive  acknowledgements  by  the  server.  In  all  Instances,  the 
control  function  requesting  a  service  or  a  resource  oust  be  prepared  for  refusal. 


3.  CHARACTERIZATION  ££  FDPS  MORE  REQUESTS 

3.1  IDS.  Work  Bssusat 

One  of  the  goals  of  an  FDPS  is  the  ability  to  provide  a  hospitable  environment  for  solving  problems 
that  allows  the  user  to  utilize  the  natural  distribution  of  data  to  obtain  a  solution  which  may  take  the 
form  of  an  algorithm  consistirg  of  concurrent  processes.  The  expression  of  the  solution  Is  in  terms  of  a 
work  request  that  describes  a  series  of  cooperating  processes,  the  connectivity  -f  tt  se  processes  (how 
the  processes  communicate),  and  the  data  files  utilized  bv  these  processes,  ihis  description  involves 
only  logical  entitles  and  does  not  contain  any  node-specific  information.  A  description  of  one  command 
language  oapable  of  expressing  requests  for  work  in  this  fashion  can  be  found  in  [Akin78]  (see  Figure  1), 

3.2  Impact  of  the  Work  Request  on  the  Control 

The  nature  of  allowable  work  requests  (not  just  the  syntax  but  what  can  actually  be  accomplished 
via  the  work  request)  determines  to  a  large  extent  the  functionality  of  an  executive  control.  Therefore, 
it  is  important  to  examine  the  characteristics  of  work  requests  and  further  to  see  how  variations  in 
these  characteristics  impact  the  strategies  utilized  by  an  FDPS  exeoutive  control. 


i  . 


A 


174 


five  basic  characteristics  of  work  requests  have  been  identified: 

1 .  the  external  visibility  of  references  to  resources  required  by  the  task, 

2.  the  presence  of  any  interprooess  communication  (IPC)  specifications, 

3.  the  number  of  concurrent  processes, 

4.  the  nature  of  the  connectivity  of  prooesses,  and 

5.  the  presence  of  command  files. 

3.2.1  Visibility  of  References  to  Resources 

References  to  the  resources  required  to  satisfy  a  work  request  nay  either  be  visible  prior  to  the 
execution  of  a  process  associated  with  the  work  request  or  embedded  in  suoh  a  manner  that  sore  part  of 
the  work  request  muse  be  executed  to  reveal  the  reference  to  a  particular  resource.  A  resource  is  made 

"visible"  either  by  the  explioit  statement  of  the  reference  in  the  :»ork  request  or  through  a  declaration 

associated  with  one  of  the  resources  referenced  In  the  work  request.  An  example  of  the  latter  means  of 
visibility  is  a  file  system  in  which  external  references  made  from  a  particular  file  are  identified  and 
stored  in  the  "header*  portion  of  the  file.  In  this  case,  the  identity  of  a  reference  can  be  obtained  by 
simply  accessing  the  header. 

The  greatest  impact  of  the  visibility  characteristic  of  resource  requirements  ooours  in  the 
construction  of  task  graphs  and  the  distribution  of  work.  The  time  at  whioh  resource  requirements  are 

deteoted  and  resolved  determines  when  and  how  parts  of  the  task  graph  oan  be  oonstruotnd.  Similarly, 

some  work  oannot  be  distributed  until  certain  details  are  resolved.  For  example,  consider  a  case  where 
resouroe  references  cannot  be  resolved  until  execution  time.  Assume  there  exist  two  prooesses  X  and  Y 
whore  process  X  has  a  hidden  reference  to  process  Y.  An  executive  control  cannot  consider  Y  in  the  work 
distribution  decision  thst  is  made  in  order  to  begin  execution  of  X.  The  significance  of  this  is  that 
certain  work  distribution  decisions  may  not  be  "globally  optimal"  because  total  information  was  not 
available  at  the  time  the  decision  was  made. 

3.2.2  The  Humber  of  Concurrent  Prooesses 

A  work  request  can  either  specify  the  need  to  execute  only  a  single  process  or  the  execution  of 
multiple  processes  which  may  possibly  be  executed  concurrently.  Obviously  with  multiple  processes,  more 
resource  availatility  information  must  be  maintained;  and  there  is  a  corresponding  increase  in  the  data 
to  the  work  distribution  and  work  allocation  phases  of  control.  In  addition,  the  complexity  of  the  work 
distribution  decision  algorithm  increases  with  more  resources  needing  to  be  allocated  and  multiple 
processes  needing  scheduling.  The  oomplt xity  of  controlling  the  execution  of  the  work  request  is  also 
increased  with  the  presence  of  multiple  processes  since  ths  control  must  monitor  multiple  processes  for 
each  work  request. 

3.2.3  The  Presence  of  Interprocess  Communication 

The  problems  described  in  the  previous  paragraph  are  amplified  by  the  presence  of  communication 
connections  between  processes.  When  interprocess  communication  is  desoribed  in  a  work  request,  the  work 
distribution  decision  oust  consider  the  requirement  for  communication  links.  In  addition,  a  compromise 
must  be  made  in  order  to  satisfy  the  oonfiicting  goals  of  maximizing  the  inherent  parallelism  of  the 
processes  of  the  work  request  and  minimizing  the  cost  of  communication  among  these  processes.  The 
control  activity  required  during  execution  is  also  impacted  by  the  presence  of  interprocess  com¬ 
munication,  It  must  provide  the  means  for  passing  messages,  buffering  messages,  and  providing  synch¬ 
ronization  to  insure  that  a  reader  does  not  underflow  and  a  < ritor  does  not  overflow  the  message  buffers. 

3.2.4  The  Nature  of  Prooias  Connectivity 

There  are  a  variety  of  techniques  available  for  expressing  interprocess  communication  including 
pipes  (see  [Ritc78])  end  ports  (see  LBalz71,  Have78,  Suns77,  Zue.k\T7)).  There  are  a  number  of  approaches 
to  realizing  these  different  forms  of  interprocess  communi nation.  The  main  impact  on  an  executive 
control,  though,  is  in  those  components  controlling  process  execution. 

3.2.5  The  Presence  of  Command  Files 

A  command  file  is  composed  of  work  requests.  Execution  of  a  work  request  that  references  a  command 
file  results  in  a  new  issue  doaling  with  the  construction  of  task  graphs.  This  issue  is  concerned  with 
whether  a  new  cask  graph  should  be  constructed  to  describe  the  new  work  request  or  should  these  new 

processes  be  included  in  the  old  task  graph.  The  difference  between  these  two  approaches  becomes 

important  during  work  distribution.  It  is  assumed  that  the  <>'k  distribution  decision  will  be  made  only 
with  the  information  available  in  the  task  graph.  Thus,  with  *  ie  first  approaoh,  only  those  tasks  in  the 
new  work  request  are  considered  while  the  seoond  approaoh  provides  the  ability  to  take  into  consideration 
the  assignment  of  tasks  from  previous  work  requests. 

3.3  A  ClaaBlAlaitUan  ai  Hack  Aaouasta 

This  examination  of  the  characteristics  of  FDPS  work  requests  has  lead  to  the  identification  of 
five  basic  attributes  which  have  significant  impact  on  an  executive  control.  In  Figure  2,  all  possible 
types  of  work  requests  are  enumerated  resulting  in  32  different  forms  of  work  requests.  It  should  be 

noted,  though,  that  16  of  these  (those  with  an  asterisk  beside  the  task  number)  contain  conflicting 

characteristics  and  thus  are  impossible. 


4.  CMBACimSIiCS  AE  ZBJEfi  COMXBQU  MU 

4.1  Approaches  to  Implementing  FDPS  Executive  Control 

There  are  two  basically  different  approaches  available  for  implementing  an  operating  system  for  a 
distributed  processing  system,  the  base-level  approaoh  and  the  meta-system  approach  [Thoo78],  The  base- 
level  approach  does  not  utilize  any  existing  software  and,  therefore,  requires  the  development  of  all  new 
software.  This  includes  software  for  all  local  control  functions  suoh  as  memory  management  and  process 
management.  In  contrast,  the  meta-system  approach  utilizes  the  "existing"  operating  systems  (called 
local  operating  systems  (LOS))  from  each  of  the  nodes  of  the  system.  Each  LOS  is  "interfaced"  to  the 
distributed  system  by  a  network  operating  system  (NOS)  which  is  designed  to  provide  high  level  services 
available  on  a  system-wide  basis.  The  meta-system  approach  is  usually  preferred  due  to  the  availability 
of  existing  software  to  accomplish  looal  management  functions,  thus,  reduoing  development  costs  [Thom78]. 


-X - kjj 


17-5 


Figure  3  depicts  a  logical  model  applicable  to  an  FDPS  executive  control  utilising  either  approach. 
The  LOS  handles  the  low-level  (processor-apeciflo)  operations  required  to  directly  interface  with  users 
and  resources.  In  the  meta-system  approach,  the  LOS  represents  primarily  the  operating  systems  presently 
available  for  nodes  configured  in  stand-alone  environments.  The  LOS  resulting  from  a  base-level  approaoh 
has  similar  functionality;  however,  it  represents  a  new  design,  and  certain  features  may  be  modified  in 
order  to  allow  the  NOS  to  provide  oartain  functions  normally  provided  by  the  L.OS.  Any  "network" 
operations  are  performed  by  the  NOS.  System  unification  is  realized  through  the  interaotion  of  NOS  com¬ 
ponents,  possibly  residing  on  different  processors,  acting  in  cooperation  with  appropriate  LOS  com¬ 
ponents.  Communication  among  the  components  is  provided  by  the  message  handier  which  utilizes  the  mes¬ 
sage  transport  services. 

4.2  Information  Reauir amenta 

Two  types  of  information  are  required  by  an  executive  control,  information  concerning  the  structure 
of  the  set  of  tasks  required  to  satisfy  the  work  request  and  information  about  system  resources.  This 
data  is  maintained  in  a  variety  of  data  structures  by  a  number  of  different  components. 

4.2.1  Information  Requirements  for  Work  Requests 

Each  work  request  identifies  a  set  of  cooperating  tasks,  r.ode3  in  a  logical  network  that  cooperate 
in  execution  to  satisfy  a  request  and  the  connectivity  of  those  nodes.  Figure  1  illustrates  the  notation 
used  in  this  project  to  express  work  requests.  An  example  of  a  work  request  using  this  notation  is 
presented  in  Figure  4.  Work  requests  as  linear  textual  forms  can  be  easily  accepted  and  manipulated  by 
the  computer  system;  however,  task  graphs,  which  are  an  internal  control  structure  used  to  describe  work 
requests,  must  be  represented  in  a  manner  such  chat  the  linkage  information  is  readily  available.  This 
can  take  the  form  of  the  explicit  linking  of  node  control  blocks  (Figure  5)  or  an  interconnection  matrix 
(Figure  6). 

Information  concerning  a  particular  task,  i.e.,  logical  node,  is  uiaintained  in  a  node  control  block 
(Figure  5).  Associated  with  each  logical  node  is  an  execution  file,  a  series  of  input  files,  and  a 
series  of  output  files.  The  node  control  block  contains  information  on  each  of  these  entities  that 
includes  ths  name  of  the  resource,  the  locations  of  possible  candidates  that  might  provide  the  desired 
resource,  and  the  location  of  the  candidate  resource  chosen  to  be  utilized  in  the  satisfaction  of  the 
work  request.  In  addition  to  this  information,  the  node  control  block  maintains  a  description  of  all 
interprocess  communication  (IPC)  in  which  the  node  is  a  party.  This  oonsists  of  a  list  of  input  ports 
and  output  ports.  (Interprocess  communication  is  a  term  describing  the  exchange  of  messages  between 
cooperating  processes  of  a  work  request.)  Typically,  a  message  is  "sent"  when  it  is  written  to  the  out¬ 
put  port  of  a  process.  The  message  is  then  available  for  consumption  by  any  process  possessing  an  input 
port  that  is  connected  to  the  previously  mentioned  output  port.  The  message  is  actually  consumed  or 
accepted  when  the  process  owning  the  connected  input  port  oxeoutes  a  READ  on  that  port. 

A  global  view  of  Interprocess  communication  is  provided  by  the  node  interconnection  matrix  (Figure 
6).  This  structure  indicates  the  presence  or  absenoe  of  an  IPC  link  between  an  output  port  of  one  node 
and  an  input  port  of  another  node.  Thus,  links  are  assumed  to  carry  data  in  only  a  single  direction. 

An  example  of  a  task  graph  resulting  from  the  work  request  in  Figure  4  utilizing  the  direct  linking 
of  node  control  blocks  is  presented  in  Figure  7.  Figure  8  illustrates  the  utilization  of  an  interconnec¬ 
tion  matrix. 

4.2.2  Information  Requirements  for  System  Resources 

Regardless  of  how  the  executive  control  is  realized  (i.e.,  how  the  components  of  the  executive 
control  are  distributed  and  how  the  control  decisions  are  decentralized),  information  concerning  all 
system  resources  (processors,  communication  lines,  files,  and  peripheral  devices)  must  be  maintained. 
This  information  includes  at  a  minimum  an  indication  of  the  availability  of  resources  (available,  reser¬ 
ved,  or  assigned).  Preemptable  resources  (e.g..  processors  and  communication  lines)  capable  of  accom¬ 
modating  more  than  one  user  at  a  time  may  also  nave  associated  with  them  utilization  information  designed 
to  guide  an  executive  control  in  its  effort  to  perform  load  balancing. 

As  discussed  below,  there  are  a  number  of  techniques  that  may  be  employed  to  gather  and/or  maintain 
the  system  resource  information. 

4.3  Baaio  Oneratlona  of  FDPS  Control 

The  primary  task  of  an  executive  control  is  to  process  work  requests  that  can  best  be  described  as 
logical  networks.  A  node  of  a  logical  network  specifies  ar  execution  file  that  may  either  contain  object 
code  or  commands  (work  requests),  input  files,  and  output  files.  These  files  may  reside  on  one  or  more 
physical  nodes  of  the  system  and  there  may  be  multiple  copies  of  the  same  file  available.  Thus,  to 
process  a  work  request,  an  FDPS  executive  control  must  perform  three  basic  operations:  1)  gather 
Information,  2)  distribute  the  work  end  allocate  resources,  and  3)  initiate  and  monitor  task  execution. 
These  operations  need  not  be  exscuted  in  a  purely  serial  fashion  but  may  take  a  more  complex  form  with 
executive  control  operations  executed  simultaneously  or  concurrently  with  task  execution  as  the  need 
arises. 


Examination  of  the  basic  operations  in  further  detail  (Figure  9)  reveals  some  of  the  variations 

possible  in  the  handling  of  work  requests.  Two  steps  exist  in  information  gathering  -  1)  collecting 

information  about  task  requirements  for  the  work  request  and  2)  identifying  the  resources  available  for 
satisfying  the  request  requirements.  Information  gathering  is  followed  by  the  task  of  distributing  the 
work  and  allocating  resources.  If  this  operation  is  not  successful,  three  alternrtlves  are  available. 
First,  more  Information  on  resource  availability  can  be  gathered  in  an  attempt  to  formulate  a  new  work 
distribution.  There  may  have  been  a  change  in  the  status  of  some  resources  since  the  original  request 
for  availability  information.  Second,  more  information  can  be  gathered  as  above,  but  this  time  the 
requester  will  indicate  a  willingness  to  "pay  more"  for  the  resources.  This  is  referred  to  as  bidding  to 
a  higher  level.  Finally,  the  uaer  can  simply  be  informed  that  it  is  impossible  to  satisfy  his  work 
request . 


17-0 


4.3*1  Information  Gathering 

Upon  receiving  a  work  request,  the  first  task  of  the  control  is  to  discover  what  resources  are 
needed  to  satisfy  the  work  request  (Figure  10)  and  whioh  resources  're  available  to  fill  these  needs 
(Figure  11).  Each  work  request  includes  a  description  of  a  serivs  c.:  tasks  and  the  connectivity  of  those 
tasks.  Associated  with  eaoh  task  is  a  series  of  files.  One  is  distinguished  as  the  execution  file  and 
the  rest  are  Input/output  files.  The  executive  control  oust  first  determine  rhioh  files  are  needed.  It 
then  must  examine  each  of  the  execution  files  to  determine  the  nature  of  its  contents  (executable  code  or 
commands).  Eaoh  task  will  need  a  processor  resouroe(s),  and  those  tasks  containing  command  files  will 
also  require  a  command  interpreter. 

An  FDPS  executive  control  must  also  determine  which  of  the  system  resources  are  available.  For 
nonpreemptable  resources,  the  status  of  a  resource  can  be  either  "available,"  "reserved,"  or  "assigned." 
A  reservation  indicates  that  a  resource  may  be  used  in  the  future  and  that  it  should  not  be  given  to 
another  user.  Typically,  there  is  a  time-out  associated  with  a  reservation  that  results  in  the  automatic 
release  of  the  reservation  if  an  assignment  is  not  made  within  a  specified  time  interval.  The  idea  here 
is  to  free  resources  that  otherwise  would  have  teen  left  unavailable  by  a  lost  process.  The  process  may 
be  lost  because  it  failed,  its  processor  failed,  or  the  communication  link  to  the  node  housing  the 
partioular  resouroe  may  have  failad.  An  assignment,  on  the  other  hand,  indicates  that  a  resource  is 
dedicated  to  a  user  until  the  user  explicitly  releases  that  assignment.  Preemptable  resources  may  be 
accessed  by  more  than  one  concurrent  user  and  thus  can  be  treated  in  a  different  manner.  For  these 
resources,  the  status  may  be  indicated  by  more  continuous  values  (e.g. ,  the  utilization  of  the  resource) 
rather  than  the  discrete  values  described  ubove. 

4.3.2  Work  Distribution  and  Resource  Allocation 

Tho  FDPS  executive  control  must  determine  the  work  distribution  and  the  allocation  of  system 
resources  (Figure  12  &  13).  This  process  involves  choosing  from  the  available  resources  those  that  are 
to  be  utilized.  This  decision  is  designed  to  achieve  several  goals  such  as  load  balancing,  maximum 
throughput,  and  minimum  response  time.  It  can  be  viewed  as  an  optimization  problem  similar  in  many 
respects  to  that  discussed  by  Morgan  (t.org77  ] . 

Once  an  allocation  has  been  determined,  the  chosen  resources  are  allocated  and  the  processes  com¬ 
prising  the  task  set  are  scheduled  and  initiated,  II  a  process  cannot  be  immediately  scheduled,  It  may 
be  queued  and  scheduled  at  a  later  time.  When  it  is  rcheduled,  a  process  control  block  and  any  other 
execution-time  data  structures  must  be  created. 

4.3.3  Information  Reoording 

Information  la  recorded  as  a  result  of  management  actions  as  well  as  providing  a  means  to  maintain 
a  historical  record  or  audit  trail  of  system  activity.  The  information  recording  resulting  from 
management  actions  maintains  the  system  state  and  provides  information  for  decision  making.  The 
historical  information  is  useful  in  monitoring  system  security.  It  provides  a  means  to  examine  past 
activity  on  a  system  in  ordfr  to  determine  if  a  breach  of  security  occurred  or  how  a  particular  problem 
or  breach  of  security  may  have  occurred. 

Management  Information  is  maintained  in  various  structures,  including  the  task  graph.  The  task 
graph  is  used  to  maintain  information  about  the  structure  of  ar  individual  work  request,  and,  thus,  its 
contents  change  as  progress  on  the  work  request  proceeds.  A  task  graph  is  created  when  a  work  request  is 
first  discovered,  and  information  is  then  constantly  entered  into  the  structure  as  work  progresses 
through  information  gathering  to  work  distribution  and  resource  allocation  and  on  to  task  execution.  The 
task  graph  remains  active  until  completion  of  the  work  request. 

Much  of  the  information  contained  in  the  task  graph  is  applicable  to  historical  records.  In  fact, 
the  task  graph  can  be  used  to  house  historical  information  as  it  is  gathered  during  work  request  proces¬ 
sing.  Upon  completion  of  the  work  request,  the  historical  information  is  extracted  and  entered  into  the 
permanent  historical  file.  Alternatively,  the  historical  file  can  be  created  directly  skipping  the 
intermediate  task  graph  structure. 

4.3*4  Task  Execution 

Finally,  an  executive  control  must  monitor  the  execution  of  active  processes.  This  includes 
providing  interprocess  communication,  handling  requests  from  active  processes,  and  supervising  process 
termination.  The  activities  associated  with  interprocess  communication  include  establishing  communica¬ 
tion  paths,  buf ft -ing  messages,  »nd  synchronizing  communicating  processes.  The  latter  activity  is  neces¬ 
sary  to  protect  the  system  from  processes  that  flooo  the  system  with  messages  before  another  process  has 
time  to  absorb  the  messages.  Active  processes  may  also  make  requests  to  the  executive  control.  These 
may  taxe  the  form  of  additional  work  requests  or  requests  for  additional  resources.  Work  requests  may 
originate  from  either  command  files  or  files  containing  executable  code. 

An  executive  control  must  also  detect  the  termination  of  processes.  Thin  includes  both  normal  and 
abnormal  termination.  Alter  detecting  process  termination,  it  must  inform  processes  needing  this 
information  that  termination  has  occurred,  open  files  must  be  closed,  and  other  loose  ends  must  be 
cleaned  up.  Finally,  when  the  last  process  of  a  work  request  has  terminated,  it  must  inform  the 
originator  of  the  request  of  the  completion  of  the  request. 

4.3.5  Fault  Recovery 

If  portions  (tasks)  of  the  work  request  are  being  performed  on  different  processors,  there  is 
inherently  a  certain  degree  of  fault  recovery  possible.  The  problem  is  in  exploiting  that  capability. 
The  ability  to  utilize  "good"  work  remaining  after  the  failure  of  one  or  more  of  the  processors  executing 
a  work  request  depends  on  the  reoovery  agent  having  knowledge  of  the  location  of  that  work  and  the 
ability  of  the  recovery  agent  to  reestablish  the  appropriate  linkages  to  tho  new  locations  for  the 
"oi-tlons  of  the  work  that  were  being  executed  on  the  failed  processors). 


17-7 


5.  VARIATIONS  IK  FDPS  CONTROL  MODELS 

There  is  an  extremely  large  number  of  features  by  which  variations  In  distributed  control  models 
can  be  characterized.  Of  these,  only  a  few  basic  attributes  appear  to  deserve  attention.  These  include 
the  nature  of  how  and  when  a  task  graph  is  constructed,  the  maintenance  of  resource  availability 
Information,  the  allocation  of  resources,  process  Initiation,  and  process  monitoring.  In  this  section, 
these  issues  are  examined;  but  again,  since  the  number  of  variations  possible  in  each  issue  is  rather 
large,  only  tho.,s  choices  considered  significant  are  discussed.  Table  2  contains  a  summary  of  the 
problems  that  have  been  identified  and  possible  solutions  < significant  and  reasonable  solutions)  to  these 
problems. 

5.i  i aalt  flracii  Coaatruotloa 

The  task  graph  is  a  data  structure  used  to  maintain  information  about  the  applicable  task  set.  The 
nodes  of  a  task  graph  represent  the  tasks  of  the  task  3et,  and  the  arcs  represent  the  connectivity  or 
flow  of  information  between  tasks.  There  ar..  basically  four  issues  in  task  graph  construction:  1)  who 
builds  a  task  graph,  2)  what  is  the  basic  structure  of  a  task  graph,  3)  where  are  the  copies  of  a  task 
graph  stored,  and  4)  when  is  a  task  gri.^h  built. 

The  identity  of  the  component  or  components  constructing  tha  task  graph  is  an  issue  that  presents 
three  basic  choices.  First,  a  central  node  can  be  responsible  for  the  construction  of  task  graphs  for 
all  work  requests.  Another  choice  utilize  .  the  control  cor,  -r  i he  code  receiving  the  work  request 
to  construct  the  task  graph.  Final’y,  the  .W  .>,  ..uing  t.«.  t ;raph  can  be  distributed  among 
several  components.  In  particular,  the  nodes  in’  lveu  i-  executing  individual  tasks  of  the  work  request 
can  be  responsible  for  constructing  those  oar‘ s  f  ‘he  task  graph  that  they  arc  processing. 


The  general  nature  of  the  ta*.  gt  ,.ph  it-.*lr  p,  >vl<  e;  Iwi  Iternatlves  for  the  design  of  an 
executive  control.  What  is  of  cone  ,t  .a  not  intent  of  a  tasi  g  aph  hut  rather  its  basic 

structure.  One  alternative  is  tr  «... .  nt  -nuts*  jph  in  a  sing1  e  st-u  cum  regardless  of  hov  execution 
is  distributed.  The  o*  hr  o  mm  .tain  l  us>  grap  as  a  collection  of  subgraphs  with  ear  n 

subgraph  representing  at  .«  v  -q  —  t.  Ft  e;  iaKIe,  a  subgraph  can  represent  that  portion  of 

the  wo**k  request  that  e  ec  on  cart  lei  ar  node  at  which  tha‘  -ubgraph  is  stored. 

Another  i.  cie  grrph  cor.:  t-  cti  oneem-  “her  to  various  copies  of  the  task  graph  are 

stored.  If  the  co-  r  saint  s  a  tusk  •  .pc  as  ,  unified  .clure  representing  the  complete  set  of 
tasks  for  a  wo--  -  :uest,  t.truc'uri  may  either  ho  sti  on  a  'ngle  node,  or  redundant  copies  can 

do  stored  or  a-  pi  node  The  s'  is  node  can  either  b  ueri-ral  ode  that  is  used  to  store  all  task 
graphs,  the  to-  it  ni  h  ne  ot  .g  work  n-ejes*  arriv.  ,  i  the  source  node),  or  a  node  chose  for  Its 
ability  t(  p  ide  ’  is  uor  r  ,ies-  wit!.  >p*  ’  serei-'e.  If  the  task  graph  is  divided  intc  several 

subgraphs,  thes.  can  t  e-  mi  V.  ,, 

Mi  ly,  t'  ere  .-at'  .  sue  <<  i.ing  the  timing  of  ta  ,  gra;  -  construction  in  'he  sequence  of 

step:  t’  <)e  work  eques'  recessing.  iVo  h  i.  es  are  available:  1)  tl  :  task  graph  can  be 

ons  ut  .t  C‘  i  "ly,  a  .  rest  he  maximu  te  po  :.ible,  before  execution  is  begun,  or  2)  the 

task  —iph  ci  t  construe  te.'  it  am-  ntally  a-  e>.  ••n  ••  prog’  ssen. 

M2  jm  twjitfli  AVdilatl^kli  Ihi  o nation 

-  ther  xiss  mp  so"i  e  of  varir  Ultv  IT  intro'  not  -i:  is  the  maintenance  of  resource 

.•  <  alia.  .Ity  i  ifore  lion,  W*  ,c  is  of  in;  Man-i-  here  "Who  mail  t  ains  this  information"  and  "Where  is 

Ms  ’ormu  ion  maintains-. . "  A  p.  ttcul  r  motir 1  ied  not  uniformly  apply  tv  n  sage  technique  for 

i-aintai  :t  -  resource  iv,  ilatllity  inform.,  ion  tc  all  r-sourcea.  Father,  the  technique  best  suited  to  , 
partici  iar  res-  -ce  -  La  ■  may  i  uti’izeu. 


The  re  tponsib  ty  for 
if  :ay  .  The  centra  ted  a. 
.'Huation,  r.  quests  and  rclca 
complete  resource  a  ai'i 11 


ii.taini  ig  esource  avei  -bility  information  >  be  delegated  in  a  variety 
ch  inv  .  assigning  a  -ingle  component  this  responsibility.  "n  this 
or  ■  sou-  res  flow  t  rough  the  special  ted  component  which  maintains  the 

relation  r.  one  1>  at  ion. 


motion  of  <  it  tech.  e  mainta  ns  comi  ete  copies  ol  :  he  resource  availability  information  at 
-several  locatlr-’s  [Ce  79a, h],  imponentj  at  ee  :h  if  these  lo-  at  ions  a’-:  responsible  for  updating  their 
copy  o1'  the  res-  irce  avcila'  iiity  inro-mation  in  order  to  Weep  It  consistent  with  the  other  copies.  This 
recjin  i  a  pro'  col  to  ineu  e  that  t  nnistoncy  is  maintained  For  example,  two  components  should  not 
relcas-  a  f  e  for  wr'  •  i-  j  to  ifferent  users  at  •’he  same  time.  To  provide  this  control,  message  j 
conta:  ling  upu  ites  fc  thv  -iformatim  ‘ahles  must  be  ex. hinged  amorg  the  components.  In  addition,  a 
strategy  for  synct  roi  *:-ig  the  ele  of  resources  la  required.  Ar  example  of  such  a  strategy  is  found 
in  '  -.ba79a,l  ]  whe  i  b;  -.on  is  ;.jsed  around  the  network.  The  holder  of  the  baton  is  permitted  to 

rel<  ,se  reso  rces. 


A  ioth-  -  approver:  rxhi  -  .ng  more  decenf ralizatlon  requires  dividing  the  collection  of  resources 
i  -c  subset  or  clas  ie  and  assigning  separate  coopcnents  to  each  subeet.  Each  component  is  responsible 
f  r  maints  ring  re  tout-  ■  availability  information  on  n  particular  subset.  In  this  case,  requests  for 
r-  -  -urces  an  only  t-e  ser.l-*r.  by  the  control  component  responsible  for  that  resource.  Resources  may  be 
r  *ed  lr.  .  manner  suri,  than  he  desl-sd  manager  is  readily  identifiable.  Alternatively,  a  search  may  oe 
r-  lulreu  i’  order  ocate  the  apprcg-lcte  manager.  This  search  may  involve  passing  the  request  from 
mponent  o  ccecoMwrt  until  tee  Is  found  that  is  capable  of  performing  the  desired  operation. 

Preempts::.. e  resources  which  can  be  chared  by  multiple  concurrent  users  (e.g.,  processors  and  com- 
urf-atlor  1  ne.i'  do  not  ueceoearily  require  the  maintenance  of  pre  -ise  availability  information.  For 
•  nese  re  ou-’-es  it  i:  reasonable  to  maintain  only  approximate  availability  information  because  such 

_  1  "  J  *  ■* *  4  a  ikenesHaH  *\A  V"  V*  ft  n  O  . 


1 7-X 


5.3  Allocating  fiflaauraftfl 

One  of  the  major  problems  experienced  In  the  allocation  of  resources  Is  concurrency  control.  In  a 
hospitable  environment,  it  is  possible  to  ignore  concurrency  control.  The  users  are  given  the 
responsibility  of  insuring  that  access  to  a  shared  resource  such  as  a  file  is  handled  in  a  consistent 
manner.  In  other  environments,  for  example  that  presented  by  an  FDPS,  this  is  an  important  issue.  In  an 
FDPS,  the  problem  is  even  more  difficult  than  in  a  centralized  system  due  to  the  loose  coupling  inherent 
in  the  system. 

There  are  basically  two  approaches  to  solving  the  problem  of  concurrent  requests  for  shared  resour¬ 
ces.  The  first  utilizes  the  concept  of  a  reservation.  Prior  to  the  allocation  of  resources  (possibly 
when  resource  availability  information  is  acquired),  a  resource  may  be  reserved.  The  reservation  is 
effective  for  only  a  limited  period  (a  period  long  enough  to  make  a  work  distribution  decision  and 
allocate  the  resources  determined  by  the  decision)  and  prevents  other  users  from  acquiring  the  resource. 
The  other  solution  to  this  problem  is  to  make  the  work  distribution  decision  without  the  aid  of  reser¬ 
vations.  If  resources  cannot  be  allocated,  the  executive  control  will  either  wait  until  they  can  be 
allocated  or  attempt  a  new  work  distribution. 

5.4  Frcanaa  Initiation 

Several  issues  arise  concerning  process  initiation.  Chief  among  these  is  the  distribution  of 
responsibility.  There  are  a  large  number  of  organizations  possible,  but  only  a  few  are  reasonable.  The 
basic  organizations  utilize  either  a  single  manager,  a  hierarchy  of  managers,  or  a  collection  of 
autonomous  managers.  Two  approaches  result  from  the  single  manager  concept.  In  the  first  organization, 
a  central  component  is  in  charge  of  all  work  requests  and  the  processes  resulting  from  these  work 
requests.  All  decisions  concerning  the  fate  of  processes  and  work  requests  are  made  by  this  component. 
A  variation  on  this  organization  assigns  responsibility  at  the  level  of  work  requests.  In  other  words, 
separate  components  are  assigned  to  each  work  request.  Each  component  makes  all  decisions  concerning  the 
fate  of  a  particular  work  request  and  its  processes. 

Management  can  also  be  organized  in  a  hierarchical  manner.  There  are  a  variety  of  ways  hierar¬ 
chical  management  can  be  realized,  but  we  will  concentrate  on  only  two,  the  two-level  hierarchy  and  the 
n-level  hierarchy.  The  two-level  hierarchy  has  at  the  top  level  a  component  that  is  responsible  for  an 
entire  work  request.  At  the  lower  level  are  a  series  of  components  each  responsible  for  an  individual 
task  of  the  wor*  request.  The  lower  level  components  take  direction  from  the  high  level  component  and 
provide  results  to  this  component.  The  n-level  hierarchy  utilizes  in  its  top  and  bottom  levels  the  com¬ 
ponents  described  for  the  two-level  hierarchy.  The  middle  levels  are  occupied  by  components  that  are 
each  responsible  for  a  subgraph  of  the  entire  task  graph.  Therefore,  a  middle  component  takes  direction 
from  and  reports  to  a  higher  level  component  which  is  in  charge  of  a  part  or  the  task  graph  that  includes 
the  subgraph  for  which  the  middle  component  is  responsiole.  The  middle  component  also  directs  lower 
level  components  each  of  which  are  responsible  for  a  particular  task. 

Another  organizational  approach  utilizes  a  series  of  autonomous  management  components.  Each  com¬ 
ponent  is  in  charge  or  some  subset  of  the  tasks  of  a  work  request.  Cooperation  of  the  components  is 
required  in  order  to  realize  the  orderly  completion  of  a  work  request. 

Regardless  of  the  organization,  at  some  point,  a  request  for  the  assumption  of  responsibility  by  a 
component  will  be  made.  Such  a  request  may  be  reasonably  denied  for  two  reasons:  1)  the  component  does 
not  possess  enough  resources  to  satisfy  the  request  (e.g. ,  there  may  not  be  enough  3pace  to  place  a  new 
process  on  an  input  queue),  or  2)  the  component  may  not  be  functioning.  The  question  that  arises 
concerns  how  this  denial  1’  handled.  One  solution  is  to  keep  trying  the  request  either  until  it  is 
accepted  cr  ,11  a  certain  number  of  attempts  have  failed.  In  this  case  if  the  request  is  never  accep¬ 
ted,  the  work  request  is  abandoned,  and  the  user  is  notified  of  the  failure.  Instead  of  abandoning  the 
work  request,  it  is  possible  that  a  new  work  distribution  decision  can  be  formulated  utilizing  the 
additional  knowledge  concerning  the  failure  of  a  certain  component  to  accept  a  previous  request. 

5.5  Process  Monitoring 

The  task  of  monitoring  process  execution  presents  the  FDPS  executive  control  with  two  major 
problems,  providing  interprocess  communication  and  responding  to  additional  work  requests  and  requests 
for  additional  resources.  With  regard  to  the  problem  of  interprocess  communication,  there  is  some  ques¬ 
tion  as  to  the  nature  cf  the  communication  primitives  an  FDPS  executive  control  should  provide.  This 
question  arises  due  to  the  variety  of  communication  techniques  being  offered  by  current  languages.  Ther  ■> 
are  two  basic  approaches  found  in  current  languages,  synchronized  communication  and  unsynohronized  com¬ 
munication  (buffered  messages).  Synchronized  communication  requires  that  the  execution  of  both  the  sen¬ 
der  and  the  receiver  be  interrupted  until  a  message  has  been  successfully  transferred.  Examples  of 
languages  utilizing  this  form  of  communication  are  Heave's  Communicating  Sequential  Processes  [Hoar78] 
and  Brinch  Hansen's  Distributed  Processes  f Hr m7 8 ] .  In  contrast,  buffered  messages  allow  the  asynch¬ 
ronous  operation  of  both  senders  and  receivers.  Examples  of  languages  using  this  form  of  communication 
are  PLUS  lFeld79]  and  STARMOD  [Cook80]. 

The  executive  control  i3  required  to  provide  communication  primitives  that  are  suitable  to  one  of 
the  communication  techniques  discussed  above.  If  the  basic  communication  system  utilizes  synchronized 
communication,  both  techniques  can  be  easily  handled.  The  problem  with  this  approach  is  that  there  is 
extra  overhead  Incurred  wher  providing  the  message  buffering  technique.  On  the  other  hand  if  the  basio 
communication  system  utilizes  unsynchronized  communication,  there  will  be  great  difficulty  in  realizing  a 
synchronized  form  of  communication. 

The  task  of  monitoring  processes  also  involves  responding  to  requests  generated  by  the  executing 
tasks.  These  may  be  either  requests  for  additional  resources  (e.g.,  an  additional  file)  or  new  work 
renuest.c.  If  the  request  is  a  work  request,  there  is  a  question  as  to  how  a  new  set  of  tasks  is  to  be 

- - ,J  he  included  in  the  existing  task 


5.6  ftnewi  Termination 

When  a  process  terminates  there  la  always  some  cleanup  work  that  must  be  accomplished  1  e.g.,  clos¬ 
ing  files,  returning  memory  space,  and  deleting  records  concerning  that  process  from  the  executive 
control's  work  space).  In  addition,  depending  on  the  reason  for  termination  (normal  or  abnormal),  other 
control  components  may  need  to  be  informed  of  the  termination.  In  the  case  of  a  failure,  the  task  graph 
will  contain  the  information  needed  to  perform  cleanup  operations  (e.g.,  the  i' entities  of  the  processes 
needing  information  concerning  the  failure).  Both  the  nature  of  the  cloanup  and  the  identity  of  the 
oontrol  components  that  must  be  informed  of  the  termination  are  determined  from  tue  design  decisions 
resulting  from  the  issues  discussed  above. 

5.7  ZxinlftA 

To  gain  a  better  appreciation  of  some  of  the  basic  issues  of  control  in  an  POPS,  it  is  useful  to 
examine  an  example  of  work  request  processing  on  an  F^PS.  In  the  example,  emphasis  is  placed  on  the 
operations  involved  in  the  construction  of  task  graphs.  T  e  work  distribution  decision  that  is  utilised 
is  a  simple  one  that  assigns  the  execution  of  processes  the  same  nodes  that  house  the  files  containing 
their  code.  The  primary  concern  of  this  example  (Figure  4)  is  the  impact  of  variations  in  work  requests 
on  task  giaph  construction.  In  this  example,  the  various  parts  of  the  overall  task  graph  describing  the 
complete  work  request  are  stored  on  the  nodes  util  xed  by  each  pert.  Other  techniques  for  storing  the 
task  graphs  may  also  be  utilized.  In  the  example,  the  following  symbols  are  utilized: 

[  ]  visible  external  reference(-) 

(  )  embedded  external  r<ference.s) 

(n)A  responsibility  for  a  delegated  from  node  n 

A(n)  responsibility  for  A  oelegated  to  node  n 

a — >b  IPC  from  process  a  to  process  b 

A ,B, . . .  uppercase  letters  indicate  command  files 

a,b,...  lowercase  letters  Indicate  executable  files 

u,v,w,x,y,z  indicate  data  files 


Now  that  we  have  taken  a  look  at  the  construction  of  task  graphs  in  a  broad  sense,  let  us  examine 
the  details  of  the  task  of  processing  a  work  request.  This  is  illustrated  in  two  figures.  Figure  15 
outlines  the  basic  steps  Involved  in  work  request  processing.  Finally,  Figure  In  depicts  the  steps 
involved  in  processing  a  specific  work  request.  In  this  c>  sc,  the  work  -'■■questT.  is  the  same  as  that 
examined  in  the  example  of  task  graph  building  (Figure  Ik). 


6.  CQMCLDSIQMS 

Thu;  far  it  has  been  possible  to  identify  a  number  of  the  characteristics  of  a  distributed  ard 
decentralized  control  system  and  to  identify  some  of  its  operational  features.  The  evaluation  of  this 
mode  of  system  control  is  the  next  task. 

t.  icmoBUDoagins 

Much  of  the  work  reported  on  here  has  been  peri  ormed  by  Timothy  G.  Saponas  as  part  of  his  research  work 
for  the  Ph.D.  degree.  His  area  of  primary  interest  is  distributed  and  decentralized  control.  The  work 
has  been  performed  as  part  of  the  Georgia  Institute  of  Technology  Research  Program  in  Fully  Distributed 
Processing  Systems.  The  support  for  this  specific  project  was  provided  by  the  Department  of  the  Air 
Force,  Home  Air  Development  Center,  Grlffiss  Air  Force  Base,  New  York,  under  contract  F30602-78-C-0120. 

8.  HEFFRENCES 


Akln78  Akin,  T.  Allen,  Flinn,  Perry  B.,  Forsyth,  Daniel  H.,  "A  Prototype  for  an  Advanced  Command 
Language,"  Proceedings  of  the  16th  Annual  Southeastern  Regional  ACM  Conference  (April,  1978) : 
96-102. 

Balz71  Balzer,  R.  M.,  "PORTS  -  A  Method  for  Dynamic  Interprogram  Communication  and  Job  Control,"  AFIPS 
Conference  Proceedings  38  ( 1 97 1  Spring  Joint  Computer  Conference):  kSS-W. 

Brin78  Brinch  Hansen,  Per,  "Distributed  Processes:  A  Concurrent  Programming  Concept,"  Communications 
iis  ASH  21  (November,  1978):  934-947 . 

Caba79a  Cabanel,  J.  P.,  Marouane,  M.  N.,  Besbes,  R.,  Sazbon,  R.  D.,  and  Diarra,  A.  K.,  "A  Decentralized 
OS  Model  for  ARAMIS  Distributed  Computer  System,"  Proceedings  of  the  First  International 
Conference  on  Distributed  Computing  Systems  (October.  1979):  629-536. 

Caba79b  Cabanel,  J.  P.,  Sazbon,  R.  D.,  Diarra,  A.  K. ,  Marouane,  M.  N.,  and  Besbes,  R,,  "A  Decentralized 
Control  Method  in  a  Distributed  System,"  Proceedings  of  the  First  International  Conference  on 
Distributed  Computing  Systems  (October.  1979):  651-65?. 

Ensl81  Enslow,  Philip  H.  Jr.,  "Distributed  Data  Processing  -  What  Is  It?,"  AGAFD  Avionics  Panel  Sym¬ 

posium  on  "T»ctloal  Airborne  Distributed  Computing  and  Networks,"  Norway,  (June  22-26,  1981). 

Feld79  Feldman,  J,  A.,  "High  Level  Programming  for  Distributed  Computing,"  Cal  cations  of  the  ACM 
22  (June,  1979):  353-368. 


17-10 


Garo79  Garcla-Mollna,  H.,  "Performance  Comparison  of  Update  Algorithms  for  Distributed  Databases, 

Crash  Recovery  in  the  Centralised  Locking  Algorithm,*  Progress  Report  No,  7,  Stanford  Univer¬ 
sity,  1979. 

Have78  Haverty,  J.  F.,  and  Rettberg,  R.  D.,  " Inter- p>'ooess  Communications  for  a  Server  in  UNIX," 

COMPCON  Fall  78  (September.  1978):  312-315. 

Hoar7B  Hoare,  C.  A.  R.,  •  nr. no  uni  eating  Sequential  Processes,"  r.osounloatlona  of  the  APM  21  (August, 
1978):  666-677. 

J*os78  Jensen,  E.  Douglas.,  *The  Honeywell  Experimental  Distributed  Processor  -  An  Overview,"  Computer 
(January,  1978):  28-38. 

Morg77  Morgan,  Howard  L.,  and  Levin,  K.  Dan,  "Optimal  Program  and  Data  Locations  in  Computer 

Networks,"  Cojj’injeatlons  the  ACM  20  (May,  1977):  315-322. 

Rito78  Ritchie,  D.  M.,  and  Thompson,  X.,  "The  UNIX  Time-Sharing  System,"  The  Bell  System  Technical 

Journal  57  (July-August,  1978):  1905-1929. 

Suns77  Sunshine,  Carl,  "Interprocess  Communication  Extensions  for  the  UNIX  Operating  System:  I.  Design 
Considerations,"  Rand  Technical  Report  R-2064/1-AF,  June  1977. 

Thom78  Thomas,  Robert  H.,  Schantz,  Richard  E.,  and  Forsdiek,  Harry  C.,  "Network  Operating  Systems," 

Bolt  Beranek  and  Newman  Report  No.  3796  (March,  1978). 

ZuokT7  Zuelcer,  Steven,  "Interprocess  Communication  Extensions  for  the  UNIX  Operating  System:  II. 

Implementation,"  Rand  Technical  Report  R-?06|t/2-AF,  June,  1977. 


<work  requeat>  ::=  [  <logical  net>  (  ;  <logloal  net>  )  ) 

<loglcal  net>  ::=  <logloal  node>  (  <node  separatory 

(  <node  separator>  )  <logloal  nodey  ) 

<node  separatory  ,  I  <plpe  oonneotlony 

<plpe  oonneotlony  ::s  [  <porty  ]  M’  t  <iogioal  node  numbery  ] 
[  .<porty  J 


<porty  <integery 

(logical  node  numbery  ::  =  (integer >  |  $  |  (labeiy 

(logioal  nodey  ::>  [  :(labeiy  ]  [  (simple  nodey  I 
(compound  nodey  ]  I 
(  (simple  nodey  I  (compound  nodey  ) 

(simple  nodey  =  (  (l/o  redirector y  )  (ocvmand  namey 
(  (i/o  redirectory  I  (argumenty  ) 

(compound  nodey  ::=  (  (i/o  redlreotory  )  ’(’  (logioal  nety 
{  (net  separatory  (logical  net>  }  ’)’ 

(  (l/o  redireootry  ) 

(i/o  redireotory  ::=  (file  namey  [  (porty  ]  I 
[  (porty  ]  •>•  (file  namey  I 
[  (porty  ]  >»•  (file  namey  I 
>y. ’  [  (porty  ] 


(net  separatory  ::  =  ; 

(ooamand  namey  ::e  (flit  namey 
(labeiy  (identifiery 


Figure  1.  Work  Request  Syntax 
(Taken  from  [AXIN78]) 


i 


w 


17-12 


1 

t 

f 


r- 


l 

i 

> 

i 


i 


i 

f 


• 

o 

« 

© 

© 

H 

•H 

iH 

© 

© 

© 

v 

© 

u 

i ! 

it 

f)  d 

i  s 

u 

-  A 

*s 

•  e  e 

-S 

©  o 

•  •  e 

©  2 

©  O 

©  o 

©  o 

«  O 

a  a 

P  © 

p  © 

p  © 

p  « 

p  © 

-o  © 

23© 

■o  © 

*3  © 

■d  © 

■o  3 

■Q  3 

■q  3 

■d  «H 

Is 

SI 

o  © 

ll 

o  1 

§1 

o 

o 

c 

o 

•• 

W  O  Vt 

O  v, 

O  Vi 

0  V 

O  Vi 

©  p 

M  «  ° 

tv  fid 

"  §  a 

m  £  9 

M 

M  fi  C 

^  S  5 

V  o 
£  04 

5  -H  -H 

►J  <H  P 

2j  p  ^ 

H  -H  y 

M  -rt  «H 

p- 

3  ©  ©  © 

tv  ©  ©  © 

*  s  s  s 

S  «  a 

k  S  g  § 

3  CL 

g  3 Is 

t  ©  o  O 

P  *  ►)  J 

g  !!s 

B  S  §  ; 

£  x  j  i- 

jp  «  0  o 

B-  *  *j 

ss 

s 

©1 

* 

h 

© 

s 

u 

H 

H 

o 

o 

M 

- -  — - - - 

— - 

i 


^  CM 

!£  l 


CV| 

•  «~s 

—  oo 


V*  v« 

o  o 


v 


as 

O.  P 
V.  V, 

o  o 


a 

—  £ 


a 

P  \£> 


a 

o, 


Q. 

at  m 


a 

—  CM 
CM 


© 

© 

3 

cr 

© 

c© 

! 


a 


P  p 

h  a 
&& 
4* 

3  3 

Q.  CL 

a  a 


■O  *3 

©  © 


& 


>  o  •  o  • 

ii  all 


a 

1 


I _ 

O  O  Q,  O  ft 

o  o 


&& 


I 


!S 


*0  ' 

©  W 

p  p 


o 
©  fc: 


a 


§  §  as 

o  o  o. 

r»  n  d 

1 


&& 


tl  V  1 


CL  CL 

a  c 


a 


9  SS  9 


o  o  to 

p  p  a 


T3  T5 
«  © 

P  P 
O  © 

«  s 

1  S 

88 

«  © 


■H  -H  «  -H  j3  O  -HfiOd 

alUlis  llsl 

a.  CL  r-i  CL  rH  -H  “  ~  ~  ~ 

Vi  Vi  5  Vi  5  O 
O  O  ©  O  ©  r-l 


Q.  CL  rH  -H 
«  bO 
%4  Vi  P  O 
O  O  ©H 
H 

^  ^  ©  5 

P  p  -o  P 

*  *  §  ^ 

o 


,  JO 

&Q-a 

Vi  Vi  O 
O  O  H 

f  ^  © 

A 


©  ©  J3 

&  &  0  &  °  fe 

p p "©  M  ©pp'S 

adoaor-iMSdo  .  -j-j. 

2tU£*a2| 

ggZg23%ggS3 tSg 


Hi 
•  % 
©  p  p 
— *  3  3 
Cl  CL 


Jl 


P 

© 

« 

3 

O* 

« 

« 

I 

s 


V, 

o 

A 

o. 

© 

V 

o 


© 

Q 


Figure  4.  Example  of  a  Work  Request  Figure  5.  Mode  Control  Block 


Name :  pgml 

Candidates: 


Figure  10.  Information  Gathering  (Resources  Required) 


Figure  12.  Resource  Allocation  and  Vork  Distribution  Figure  13.  Work  Assignment 


17-17 


Request  ■  RUM  A 


Task  Graph  Maintained 
At  This  Node 


Local  Resources 
A  [c—  ><1}  x 
c  [x] 

Node  l 

(Source  of  request) 


Task  Graph  Maintained 
At  This  Node 


Taak  Graph  Maintained 
At  This  Rode 


Local  Resources 


Local  Resources 
d  [y,*] 
y  * 


Task  Graph  Maintained 
At  This  Node 


Local  Resources 


Conner ts: 

A  nope  ooeplex  request: 

1)  Contains  an  explicit  reference  to  IRC. 

2)  Resource  filer  looated  on  different  nodes. 
First  layer  is  built. 


Taak  Graph  Maintained 
At  This  Rode 

A 

/l\ 

o—  >d(2?) 


Looal  Resources 
A  [o— >d]  x 
o  [x] 

Rode  1 

(Souroe  of  request) 


Task  Graph  Maintained 
At  This  Rode 


Taak  Graph  Maintained 
At  Thla  Rode 

o(1)— >(1?)d 


Looal  Resources 


Looal  Reaources 

ty>») 

y  i 


Taak  Graph  Maintained 
At  This  Node 


Looal  Reaources 


Consents; 

File  d  la  looatad  on  noda  2  and  rasprueibillty 
for  d  la  tentatively  delate  tad  to  t'-  ,•  node. 


Task  Graph  Maintained 
At  This  Node 


Local  Resources 
A  to — >d]  x 

o  tx] 

Node  1 

(Source  of  request) 


)  Task  Graph  Maintained 
{  At  This  Node 


Task  Graph  Maintained 
At  This  Node 

o(1)— >(1)d 


Local  Rasources 


Local  Resources 

<s  ty»*l 

y  * 


Task  Craph  Maintained 
At  This  Node 


Looal  Reaouroes 


CoMenta: 

Responsibility  for  d  is  sooaptad  by  noda  2. 


Task  Graph  Maintained 
At  This  Node 


Looal  Reaouroes 
A  to— >d)  x 

o  [x] 

Node-  1 

(Souroe  of  request) 


Taak  Graph  Maintained 
At  Thla  Node 


I  I 

i  Task  Graph  Maintained  i 
|  At  This  Node  I 

I  I 

!  o(D— >(l)d  I 

!  /  \  1 

t  y  i  I 


Looal  Reaouroes 


!  Local  Resources 
I  d  [y,x] 

!  y  * 


Task  Graph  Maintained 
At  Thla  Node 


Looal  Resources 


Coesunts: 

The  graph  below  d  la  ooaplotod. 


Figure  14.  Example 


Figure  15.  Ba3lc  Steps  in  Worlc  Request  Processing  Figure  16.  An  Example  of  Work  Request  Processing 


18-1 


RECOVERY  IN  DISTRIBUTED  PROCESSING  SYSTEMS 

Liba  Svobodova 
1NR1A 

Rocquencourt 
78 1  S3  Le  Chesnay  Cedek 
France 


Abstract 

A  powerful  control  abstraction  called  an  atomic  action  has  been  developed  as  a  general  mechanism  for  control¬ 
ling  accesses  to  shared  distributed  data.  In  order  to  preserve  consistency  of  the  system,  if  an  atomic  action 
fails,  all  of  its  effects  are  undone;  thus  if  a  long  complex  computation  is  represented  as  an  atomic  action, 
an  important  amount  of  pos'.ibly  useful  work  might  be  lost.  The  proposed  scheme  which  facilitates  selective 
internal  recovery  from  detected  errors,  node  failures,  and  communication  failures  employes  nested  atomic 
actions.  When  an  atomic  action  terminates,  its  results  are  not  made  permanent  until  the  outermost  atomic 
action  is  comnitted,  but  they  survive  local  node  failures.  Each  subtree  of  nested  atomic  actions  is  reco¬ 
verable  (undoable)  individually,  thus  making  it  possible  to  switch  to  an  alternative  algorithm,  service,  or 
physical  node  upon  a  failure.  Finally,  a  recovery  point  is  established  in  stable  storage  as  part  of  a  remote 
request,  so  that  work  done  outside  of  the  requesting  node  is  not  lost  if  this  node  fails. 

1.  INTRODUCTION 

A  distributed  system,  as  viewed  in  this  paper,  is  a  network  of  computing  nodes  which,  although  they  have  to 
cooperate  in  some  predetermined  manner,  maintain  a  fair  degree  of  autonomy  with  respect  to  their  internal 
organization  and  management  [CLAR  80,  SVOB  79A,  SVOB  79B].  The  communication  subsystem  facilitates  exchange 
of  messages  between  any  two  nodes,  but  does  not  guarantee  it  at  all  times.  Individual  nodes  provide  certain 
services  to  the  rest  of  the  system.  These  services  are  not  memoryless  :  while  they  can  be  provided  only  if 
adequate  hardware  resources  are  available  at  the  node,  they  contain  another  critical  component,  and  that 
is  stored  data. 

Distributed  systems  are  often  claimed  to  be  inherently  more  reliable  than  systems  that  are  built  on  the  top 
of  a  single  central  processor.  Fir:  ,  propagation  of  low  level  errors  is  restricted  by  physical  separation 
of  processes  and  resources.  Second,  if  one  node  fails,  it  might  be  possible  to  finish  the  computing  tasks 
in  progress  by  using  services  of  another  node.  However,  distributed  systems  introduce  also  new  reliability 
problems,  the  most  basic  one  being  the  difficulty  of  maintaining  a  globally  consistent  state  oi  the  system. 
Given  that  the  programs  of  the  Individual  tasks  are  correct,  the  problem  of  maintaining  a  consistent  state 
becomes  a  problem  of  synchronization  and  recovery.  In  a  distributed  system,  the  difficulty  of  recovery  is 
in  part  due,  paradoxically,  to  the  fact  that  a  failure  of  a  single  node  does  not  disable  the  whole  system. 

The  other  important  aspect  is  the  uncertainty  brought  about  by  the  inperfect  communication  subsystem  :  from 
the  point  of  view  of  a  node  requesting  a  service,  a  failure  of  the  communication  subsystem  to  deliver  the 
request  or  the  response  is,  in  general,  indistinguishable  from  a  failure  (crash)  of  the  node  providing  the 
service. 

The  problem  of  recovery  in  a  distributed  system  has  been  studied  moBtly  in  the  context  of  database  manage¬ 
ment.  A  logical  unit  of  work  is  represented  by  a  transaction  [TRAI  79,  GRAY  8oJ.  Transactions  are  assumed 
to  preserve  certain  application  specific  integrity  constraints  defined  on  the  data,  as  well  as  the  integri¬ 
ty  constraints  of  the  data  structures  representing  the  database.  To  maintain  the  integrity  constraints,  if 
an  error  is  encountered  during  execution  of  a  transaction,  the  transaction  is  aborted  and  all  of  its  effects 
are  undone.  A  related  aspect  is  that  of  the  stability  of  results  :  if  a  transaction  completes,  its  effects 
ar,  guaranteed  to  be  permanent,  that  is,  the  results  will  not  be  lest  or  damaged  by  subsequent  system 
f-ciures;  this  is  again  a  recovery  problem,  although  at  a  different  (lower)  system  implementation  level. 

A  computation  rn  a  distributed  system  may  not  be  able  to  proceed  normally  for  many  different  reasons  : 

-  the  invoker  decided  to  abort  it 

-  the  inputs  were  incorrect 

-  an  unrecoverable  hardware  or  software  error  was  encountered  during  execution 

-  a  scheduling  conflict  was  encountered 

-  one  of  the  involved  nodes  failed 

-  communication  failed. 

As  said  above,  in  the  simple  transaction  model  used  in  distributed  database  management  systems,  all  of 
these  situations  are  treated  in  the  same  way  :  the  transaction  is  aborted  and  its  effects  are  undone.  For 
long,  complex  computations,  a  lot  of  work  might  be  wasted  it  this  policy  is  followed.  Thus,  in  addition  to 
guaranteeing  data  integrity  and  stability,  an  important  goa  is  to  complete  computations  in  spite  of  errors 
and  failures  of  dlffeient  system  components.  In  particular,  since  the  same  of  similar  service  might  be  pro¬ 
vided  by  several  nodes,  a  failure  of  some  node  or  a  failur'  to  communicate  with  a  particular  node  does  not 
have  to  abort  atl  computations  requiring  such  a  service.  A  so,  if  a  failed  node  can  recover  in  such  a  way 
that  it  does  remember  the  state  of  the  computations  that  v  re  running  on  it  at  the  time  of  the  crash,  these 
computations  can  be  coinple'ed  without  having  to  seek  altei  native  resources  or  alternative  solutions.  This 
paper  focuses  on  this  problem  of  resiliency,  and  in  particular,  resiliency  •.  ith  respect  to  node  and  commu¬ 
nication  failures. 


18-2 


2.  GENERALIZED  MODEL  OF  DISTRIBUTED  COMPUTATIONS 

The  transaction  model  assumed  in  most  studies  of  recovery  issues  in  distributed  database  management  systems 
is  limiting  from  yet  another  point  of  view  :  in  general,  database  transactions  are  assumed  to  have  a  very 
flat  (usually  just  one  level)  strictly  hierarchical  structure.  A  transaction  has  a  coordinator  and  several 
data  managers  (workers,  agents)  that  manage  different  parts  of  the  database,  but  there  are  no  lower  level 
dependencies  between  these  data  managers.  More  general  distributed  computations  might  present  the  sort  of 
problem  depicted  in  Figure  1.  In  this  example,  the  top  level  program  is  initiated  at  node  A.  This  program 
includes  requests  for  service  Si  from  node  B,  and  service  Sj  from  node  C.  A  short  notation  Sq.N  will  be 

used  throughout  this  paper  where  Sm  specifies  the  service  requested  and  N  the  name  of  the  node  providing 

the  service.  The  services  provided  by  the  individual  nodes  might  be  much  more  complex  than  "read  data"  and 
"write  data"  usually  assumed  to  be  the  only  types  of  requests  in  database  transactions.  Following  the  con¬ 
cepts  of  structured  programming,  the  actual  implementation  of  these  services  is  unknown  to  the  invoker. 

Thus  the  program  at  node  A  does  not  known  that  requests  Si.B  and  Sj.C  both,  as  part  of  their  implementation 
request  services  from  node  E  and  that  each  such  request  results  in  an  update  of  a  data  object  X. 

If  the  programs  that  implement  the  services  Si.B  and  Sj.C  are  executed  concurrently,  their  proper  synchro¬ 
nization  during  normal  execution  represents  practically  the  earn  problem  as  the  problem  of  the  synchronizing 

database  accesses  of  independent  concurrent  computations.  The  problem  that  will  be  studied  here  is  the  pos¬ 

sibility  of  recovery  of  the  individual  requests.  Assume  that  the  request  sent  to  node  B  fails,  but  by  that 
time  node  C  has  already  done  a  significant  amount  of  work  as  a  result  of  the  request  received  from  node  A. 

As  a  reponse  to  a  failure  of  the  request  Si.B,  the  requesting  program  at  node  A  might  try  one  of  the  follow¬ 
ing  alternatives  : 

1.  retry  request  Si.B 

2.  search  another  node  that  provides  the  same  service  as  node  B 

3.  try  an  alternative  algorithm  (different  service)  that  produces  possibly  different  kinds  of  results, 
but  still  satisfactory  (less  accurate,  for  example). 

At  the  programming  level,  such  alternatives  could  be  specified  with  the  aid  of  a  construct  called  a  recovery 
block  £raND  75].  However,  before  an  alternative  can  be  tried  at  any  level,  it  is  necessary  to  restore  the 
state  of  the  resources  used  by  the  failed  branch  of  the  computation.  If  object  X  has  already  been  modified 
as  a  result  of  the  failled  request  S^.B,  and  if  thia  modification  has  been  seen  by  the  other  branch  that 
originated  at  node  C,  it  might  be  necessary  to  undo  indeed  everything.  The  main  point  is,  however,  that 
these  dependencies  are  not  known  at  the  level  of  node  A  :  unless  some  control  mec'nanisms  are  added,  it  is 
always  necessary  to  account  for  the  worst  case,  and  to  undo  everything.  It  should  be  noted  that  this  kind 
of  problem  will  be  encountered  even  in  a  single  processor  system,  if  the  "nodes"  are  just  separate  modules 
such  as,  for  example,  the  guardiens  [LISK  79,  SVOB  79A].  However,  additional  problems  occur  in  a  network 
of  physical  nodes,  as  will  be  seen  later. 

3.  ATOMIC  ACTIONS 

A  general  mechanism  for  solving  the  problem  of  consistency  in  the  presence  of  concurrent  computations  ,nd 
asynchronous  faults  is  a  construct  or  a  control  ab  traction  called  atomic  action.  From  the  point  of  view  of 
the  invoker,  an  atomic  action  is  an  operation  the  effects  of  which  are  determined  entirely  by  its  algorithm. 
Atomic  actions  are  : 

1.  indivisible  with  respect  to  concurrent  computations  :  the  intermediate  results  of  one  atomic  action 
cannot  be  modified  or  observed  by  concurrent  computations  ; 

2.  indivisible  with  respect  to  failures  :  an  atomic  action  either  terminates  normally  and  produces  a 
new  consistent  state  as  defined  by  its  algorithm,  or  has  no  effects. 

In  transaction-oriented  database  management  systems,  the  transactions  are  in  fact  atomic  actions;  however, 
the  concept  of  an  atomic  action  is  more  general  than  that  of  ar,  update  of  a  shared  database. 

From  the  implementation  point  of  view,  an  atomic  action  can  be  viewed  as  a  control  sphere  that  encompasses 
a  set  of  resources,  botn  shared  and  private.  An  atomic  action  can  be  executed  by  a  single  process,  if  all 
these  resources  are  in  the  same  physical  node,  or  it  might  involve  several  processes.  The  resources  could 
be  all  acquired  at  the  beginning  of  the  execution  of  the  atomic  action,  however,  often  this  is  not  possible 
since  the  complete  set  of  the  required  resources  is  not  known  at  that  time.  For  example,  the  "resources"  might 
be  records  of  a  database.  Which  records  will  be  read  or  modified  might  depend  on  the  value  of  certain  fields 
of  some  other  records.  One  solution  is  to  "acquire"  the  whole  database.  A  more  effective  solution  is  to  let 
the  atomic  action  acquire  needed  records  during  the  course  of  execution,  as  the  need  is  determined.  This 
necessitates  synchronization  protocols  that  properly  order  the  elementary  execution  steps  of  different  atomic 
actions,  and  resolve  scheduling  anomalies,  Basically,  it  is  necessary  to  ensure  that  a  set  oc  atomic  actions 
executed  concurrently  is  serializable  [ESl-'A  76],  If  an  atomic  action  fails,  the  resources  that  it  has  acquired 
have  to  be  restored  (recovered)  to  their  state  at  the  time  of  their  acquisition,  and  released;  to  ensure  that 
no  other  computations  have  been  affected  by  such  a  failure,  the  resources  are  not  released  until  the  atomic 
action  terminates 

Many  sophisticated  mechanisms  have  been  proposed  to  provide  atomicity  in  distributed  system?.  Serializability 
of  atomic  actions  is  achieved  either  by  locking  protocols  or  by  a  priori  ordering  of  reque-  .  <  belonging  to 
different  atomic  actions  by  associating  with  them  globally  unique  timestamps.  In  this  paper,  r  .ly  the  mecha¬ 
nisms  needed  to  assure  atomicity  from  the  point  of  view  of  failures  will  be  discussed.  Also,  while  it  would 
be  interesting  to  consider  different  types  of  resources,  the  resources  of  an  atomic  action  are  assumed  to  be 
data  objects.  The  key  problems  then  are  :  i.  coordination  of  the  changes  to  the  physical  representation  of 


*)  A  more  general  transaction  model  that  covers  situations  of  this  kind  is  developed  in  [LIND  79],  However, 
the  emphasis  in  this  model  is  on  detecting  node  crashes,  after  which  the  whole  transaction  is  aborted. 

Also,  a  transaction  can  be  executing  only  on  a  single  node  at  a  time. 


18-3 


objects  updated  vithin  the  SBoe  atomic,  accion,  ii.  their  commitment,  that  is,  malting  these  changes  perma- 
nent  and  visible  to  other  coproutationSp  iiie  object  recovery,  chat  is,  restoration  of  an  object  to  Its 
previous  state,  and  iv.  coor  ination  of  the  recovery  of  the  objects  modified  by  a  failed  atomic  action. 

These  are  non-trivial  problems  even  if  all  objects  are  stored  at  the  same  node  and  th<>  atomic  action  in¬ 
volves  only  a  single  process;  in  a  distributed  system,  tha  inherent  uncertainty  and  the  cost  of  internode 
communication  add  another  dimension  to  this  problem. 

In  order  to  be  able  to  execute  arbitrary  computations  as  atomic  actions,  it  is  necessary  that  the  elementary 
steps  of  which  these  computations  .  >  »  constructed  are  also  atomic.  In  particular,  physif  il  updates  of  data 
on  storage  devices  must  be  atomic.  Ip.  general,  to  guarantee  that  stored  „ata  will  survive  node  crashes,  t.  sy 
must  be  stored  on  ncn-volatile  secondary  storage  devices,  since  the  usual  recovery  from  e  crash  is  to  rein¬ 
itialise  the  system,  which  means  thgt  from  the  point  of  view  of  normal  access,  the  previous  content  of  the 
primary  memory  is  effectively  lost.  ^  But  such  storage  is  not  yet  stable;  additional  procedures  an  mecha¬ 
nisms  (e.g.  duplication,  checkpoints  t  log)  are  needed  in  order  that  stored  information  survives  device 
crushes  snd  spontaneous  decays.  However,  the  system  could  crarh  during  u  write  operation,  when  part  of  the 
data  has  already  been  overwritten  with  a  new  value;  this  would  leave  the  data  object  in  an  undefined  state. 
Stable  storage  that  guarantees  that  a  write  operation  is  either  performed  correctly  or  has  no  effects  is 
called  atomic  stable  storage.  Efficient  implementation  of  a  storage  system  with  such  properties  is  still 
a  research  issue  [LAMP  79,  SVOB  80];  in  this  paper,  it  is  a a turned  that  all  nodes  provide  stable  storage  and 
that  information  stored  there  can  be  changed  atomically,  although  it  does  not  necessarily  mean  that  such 
information  is  updated  in  place. 

4.  NESTED  ATOMIC  ACTIONS 

From  the  recovery  point  of  view,  atomic  actions  can  be  viewed  as  a  damage  confinement  mechanism  :  while  it 
is  generally  assumed  that  everything  within  the  failed  atomic  action  is  suanecc,  the  mutual  exclusion  mecha¬ 
nisms  of  atomic  actions  guarantee  that  nothing  outside  has  been  affected.  The  damage  confinement  is  a  very 
useful  property  since  it  makes  comDutations  that  are  implemented  as  atomic  actions  separately  recoverable. 
However,  as  already  argued,  tiie  assumptions  about  the  damage  within  an  atomic  action  is  often  unnecessarily 
strict. 

An  alternative  to  aborting  the  entire  atomic  action  is  to  set  up  recovery  lines  within  it  :  when  an  error 
is  detected,  the  computation  has  to  be  backed  out  only  to  the  nearest  recovery  line.  If  an  atomic  action 
involves  just  a  single  process,  a  recovery  line  consists  of  a  single  recovery  point  (checkpoint)  that  con¬ 
tains  the  state  of  that  process.  If  several  processes  are  involved,  then  recovery  lines  can  be  either  pre¬ 
arranged,  or  determined  dynamically.  The  beginning  of  an  atomic  nccion  represents  a  preplanned  recovery  line. 
However  if  processes  do  not  set  up  recovery  lines  in  a  coordinated  manner,  where  the  nearest  recovery  line 
is  at  the  time  when  an  error  is  detected  is  not  obvious.  Merlin  and  Randell  developed  "chase  protocols"  for 
determining  recovery  lines  dynamically  [mERL  77].  This  work  was  extended  by  Hood  who  worked  out  a  protocol 
for  keeping  track  ot  the  dependences  between  processes  (propagation  of  information)  and  for  determining 
when  it  ia  safe  to  discard  a  particular  recovery  point  [WOOD  81].  The  approach  taken  hjje  is  essentially  to 
preplan  the  recovery  structure,  and  to  tie  it  to  the  logical  structure  of  the  program.  ' 

The  basic  solution  ia  to  use  nested  atomic  actiona  :  each  atomic  action  can  ba  built  of  smaller  atouic  actions 
that  can  be  executed  either  sequentially  or  in  parallel,  and  that  will  ba  proparly  synchronised  with  reaped 
to  use  r,f  shared  drta  objects.  Reed  developed  an  integrated  set  of  mechanisms  for  implementation  end  control 
of  nested  atomic  actions  [reed  78];  these  mechanisms  will  be  extended  here  to  facilitate  selective  internal 
recovery. 

In  Reed's  model,  each  atom;  action  is  represented  by  two  entitled  :  a  pseudo-temporal  environment  and  a 
consult  record.  The  pseudo-temporal  environment  is  the  mechanism  that  aasures  serialisability  of  atomic  ac¬ 
tions.  The  commit  record  is  u  data  atructure  that  contains  the  atate  of  tha  atomic  action.  The  commit  record 
ia  created  with  the  state  aet  to  "undefined".  When  the  atomic  action  terminates  normally,  the  state  ia  set 
to  "committed",  otherwise,  if  thu  termination  is  abnormal,  the  state  in  the  commit  record  ia  eat  to  "aborted". 
An  atomic  action  afx  which  is  nested  within  an  atomic  action  ai  is  made  dependent  on  the  outcome  of  a£  :  this 
dependence  is  recorded  in  the  commit  record  of  af*  in  the  form  of  a  reference  to  the  commit  record  of  a}. 
Commit  records  are  stored  in  atomic  stable  storage.  Finally,  all  requests  to  create,  read,  updata,  or  dalate 
an  object  include  a  reference  to  the  commit  record  of  the  atomic  action  within  which  tha  request  is  mads, 

When  an  object  is  updated, the  system  creates  a  new  stable  version  of  this  object  without  destroying  the  old 
one.  This  version  contsins  a  reference  to  the  cowit  record  of  the  atomic  action  under  which  it  vae  created 
As  long  ns  that  commit,  record  in  in  the  state  "undefined",  only  the  atomic  action  that  created  that  version 
can  read  it.  Once  this  atomic  action  terminates,  its  commit  record  is  sat  to  tha  stats  "cowitted",  but  it 
dees  not  mean  t.iat  the  new  version  can  be  used  freely  from  anywhere  within  the  system  :  its  fata  etill  depends 
on  the  outcome  of  the  enclosing  atomic  actions.  However,  once  cooritled  locally,  a  new  version  can  be  used 
from  anywhere  within  the  invocation  subtree  rooted  by  the  nearest  enclosing  atomic  action  that  ie  still  in 
the  stace  "undefined",  since  if  this  atomic  action  is  eventually  aborted,  ell  of  its  dependants  will  be  abort¬ 
ed  anyway.  When  an  atouic  acticn  ia  aborted,  all  of  the  object  versions  creatad  by  it  and  by  all  of  its  de¬ 
pendents  are  discarded,  but  this  does  not  offset  other  branches  of  the  invocation  tree,  since  they  could  not 
have  seen  the  invalidated  versions.  Once  the  top  level  atomic  action  reaches  tbe  final  state,  be  it  "aborted" 
or  "committed",  this  information  ia  propagated  to  its  dependents  -nd  successively  to  their  dependents  end 
encached  in  their  commit  records. 


»)  It  it  quite  difficult  to  find  a  simple  definition  of  "system  crash";  in  this  paper,  it  will  be  essiewd 
that  a  crash  is  any  event  that  causer  such  complete  reinitialisation. 

**)  A  similar  approach  ie  used  by  Shriyaatave,  but  he  assumes  that  recovery  might  ba  provided  on  e  more 
abstract  level,  under  the  direction  of  a  manager  of  an  abstract  type  [SflJtl  Hi], 


34 

Let  us  return  to  the  example  given  in  Section  2.  ’ihe  main  program  at  node  A  will  be,  of  course,  an  atomic 
action,  but  in  addition  each  remote  requeat  will  star;  a  new  atomic  action  in  the  receiving  node.  It  ia 

apeumed  that  each  request  returns  a  response  when  the  atomic  action  created  by  that  requeat  terminates. 

It  ie  the  responsibility  of  the  requestor  to  wait  for  the  response  before  its  atomic  action  ia  committed. 

Now  assume  that  the  request  Sj.E  from  node  B  is  the  first  one  to  arrive  -at  n  ete  E  ;  this  situation  is  de¬ 
picted  in  Figure  2.  Once  the  execution  of  this  request  ia  finished,  it  is  poasible  to  process  the  request 
Sfc.E  from  node  D,  but  not  S^.E  from  node  C,  since  the  atomic  action  ror.tsd  at  node  B  has  not  finished. 

Figure  3  nhows  a  situation  when  Si.B  terminated  normally  and  a  new  version  of  object  X  has  been  created 

finally  by  the  request  Sp.E  from  node  C.  If  the  request  Sj.B  failed  for  some  reason,  both  versions  Xj  and 
X2  would  be  discarded  before  Si.E  could  proceed.  Of  course,  it  is  assumed  that  there  do  not  exist  any  pre¬ 
cedence  con  ':raints  between  the  updates  performed  on  X,  otherwise  the  requests  3  .B  and  Sj.C  could  not  be 
executed  concurrently,  without  any  explicit  synchronization  on  their  level. 

5.  CRASH  RECOVERY 

The  mechanisms  described  in  the  preceding  section  are  sufficient  lor  orderly  recovery  from  errors  that  are 
either  reported  or  can  be  safely  detected  by  the  invoker  of  a  request  for  service.  In  the  given  example, 
it  would  mean  that  if  the  request  Si-S  fails,  either  node  B  sends  an  error  message  to  node  A  or  A  detects 
an  erroneous  response.  In  either  case,  a  receives  some  response  from  B.  As  said  earlier,  object  versions 
and  commit  records  are  stored  in  stable  storage,  thus  they  survive  code  crashes.  This  means  that  if,  for 
example,  node  E  crashes  after  it  has  sent  back  a  response  to  the  request  Sfe.E,  this  crash  can  have  no  effect 
or.  the  results  of  that  particular  call.  Howevei ,  if  an  invoker  dees  not  receive  a  response  to  its  request, 
the  situation  becomes  more  complicated.  Namely,  to  prevent  that  a  node  waits  indefinitely  for  a  response 
from  another  node,  it  is  necessary  to  set  a  timeout  for  each  temote  request.  However,  when  the  timeout  ex¬ 
pires,  it  ia  not  possible  to  deduce  the  state  of  the  atomic  action  created  by  that  request.  Any  of  the  fol¬ 
lowing  might  have  happened  : 

1.  the  target  node  Z  never  received  the  request  (the  cosmunication  subsystem  did  not  deliver  the 
message) 

2.  the  request  was  executed  but  terminated  abnormally 

3.  execution  of  the  request  was  interrupted  by  a  crash  of  node.  7. 

4.  execution  of  the  request  terminated  normally,  but  the  response  was  not  delivered  to  the  requestor 
(either  the  node  Z  crashed  before  the  response  could  he  sent,  or  the  communication  subsystem  failed 
to  deliver  , he  message) 

5.  execution  of  the  request  still  continues  (either  the  timeout  was  set  too  short  or  the  execution 
is  slower  due  to  high  load  or  the  need  to  recover  from  internal  errors) . 

When  the  timeout  expires,  the  invoker  may  decide  either  to  repeat  the  requeat  or  try  an  alternative  service, 
or  an  alternative  algorithm.  Let  us  postpone  the  discussion  of  the  first  possibility  until  the  next  section 
and  analyze  the  problem  of  twitching  to  an  alternative.  For  the  firat  two  situations  listed  above  nothing 
special  has  to  be  done  since  Che  failed  request  had  no  effects.  In  the  other  three  cases,  the  atomic  action 

started  by  the  request  either  has  been  or  might  be  locally  committed  (in  case  4  3,  thie  assumes  that  the 

node  recovers  in  such  a  way  that  it  io  capable  of  resuming  the  computations  interrupted  by  the  crash).  Its 
commit  record  contains  a  reference  to  the  coomit  record  of  the  directly  enclosing  atomic  action,  that  is, 
the  atomic  action  of  its  invoker;  later,  when  the  state  of  the  commit  record  of  the  invoker  is  set  to  "com¬ 
mitted",  the  whole  subtree  abandoned  when  the  invoker  switched  into  an  alternative  algorithm  would  be  in 
fact  committed  t  Thua  it  ia  necessary  in  some  way  to  invalidate  the  reference  in  the  commit  record  of  e 
dependent  atomic  action  declared  to  have  failed  on  ths  basis  of  a  timeout.  The  commit  record  of  an  atomic 
action  should  reside  on  the  tame  node  aa  the  objects  manipulated  by  the  atomic  action,  that  ia,  in  the  givan 
model  on  the  node  on  which  the  atomic  action  is  axacutad.  This  msans,  however,  that  if  no  rasponsa  ie  receiv 
ed  from  this  node,  it  must  be  Assumed  that  tha  commit  racord,  if  it  sxists,  is  also  inaccessible  and  there- 
fore  the  reference  to  the  coosit  record  of  the  invoker  cannot  be  removed.  A  possible  solution  shown  in 
Figure  4  is  to  sdd  to  each  commit  record  a  list  of  the  identiflera  of  tha  current  dependent  atomic  actiona. 
In  addition,  each  atomic  action  Trill  contain  its  own  id  in  its  commit  racord.  The  identifier  of  a  dependent 
atomic  action  is  generated  by  thi  invoker  (although  it  could  be  generated  by  coma  third  party)  and  includad 
iu  the  request  sent  to  the  node  providing  the  service.  When  the  invoker  decides  that  a  particular  raquaat 
has  failed,  it  .'jaova-s  its  id  (that  ia,  tha  id  of  tha  atonic  action  that  might  have  been  atarted  by  tha 

request)  from  the  list  in  its  mm  commit  record.  Before  tha  results  of  a  dependent  atomic  action  «ix  can  be 

committed  up  to  the  level  of  its  invoker  sj,  it  is  necessary  to  check  if  the  identifier  of  a£x  is  still  on 
the  list  in  the  ccmmir.  record  of  a-. 

If  a  node  is  to  resume  local  computations  interrupted  by  a  crash,  and  thie  is  important  in  particular  whan 
a  computation  had  mada  remote  requests,  it  ie  necesttry  for  each  such  computation,  to  renumber  not  only  its 
locsl  state,  but  also  ite  interactions  with  other  nodes.  Some  of  this  information  la  already  in  the  commit 
record,  however,  it  is  elso  necessary  to  remember  the  outstanding  raqueats.  Thua  a  checkpoint  should  be  esta 
blished  in  stable  storage  as  part  of  a  remote  request.  A  remote  requet  thus  should  include  tha  following 
reaps  ; 

i.  thw  Invoker  generates  a  new  idsTr'.if  lev  ID  for  the  dependant  atomic  action 

ii.  thie  identifier  ie  include  n  the  list  in  the  <  it  record  of  the  invoker  ;  the  commit 
record  is  updated  atomically  in  stable  storage 

iii.  a  checkpoint  is  made  which  includes  e  reference  to  the  cemmit  record  and  the  message  to  be  sent 

iv.  the  message  which  includes  the  identifier  ID  and  a  reference  to  tha  cmwit  racord  of  tha  invoker  * 
ia  sent  to  the  target  aode 

v.  on  failure  :  remove  ID  from  the  list  is  the  cooes  it  record  af  the  invoker  ;  the  comeit  record  is 
updated  atomical ly  in  stable  storage 

*)  A  reference  to  a  commit  record  could  be  an  Identifier  of  the  actual  object  that  represents  ths  emit 
record  or  the  identifier  of  the  atomic  action  represented  by  that  commit  record. 


18-5 


if  the  node  crasher,  after  the  checkpoint  but  before  another  checkpoint  is  eatabiiahed,  the  request  will  be 
resent(  thus  the  target  node  must  be  able  to  detect  when  a  received  request  is  a  duplicate.  Although  many 
communication  subsystems  detect  and  suppress  duplicate  messages!  their  mechanisms  are  not  sufficient,  since 
from  the  point  of  view  of  the  cotmmmication  subsystem,  each  retry  represents  a  different  message.  However, 
if  the  request  has  been  previously  received,  then  the  receiving  node  must  contain  a  commit  record  vith  that 
ID;  detection  of  a  duplicate  is  therefore  siieple.  Finally,  if  the  invoking  node  crashes  during  step  v  but 
before  the  ID  has  been  removed  from  the  list,  again  the  request  will  be  resent;  at  this  time,  it  might  actual¬ 
ly  succeed,  if  the  "feilure''  detected  previously  was  a  result  of  a  timeout,  but  this  does  not  cause  any  in¬ 
consistency. 

6.  PROGRAMMING  ASPECTS 

As  already  mentioned  in  Section  2,  a  programming  construct  called  a  recovery  block  can  be  used  to  specify 
the  alternatives  to  be  tried  in  retie  that  a  particular  request  fails.  TUa  structuring  imposed  by  recovery 
blocks  also  provides  another  c.; Action  to  the  problem  of  branches  abandoned  rs  a  recult  of  a  timeout  diecuee- 
ed  in  the  preceding  section.  Figure  5  shows  a  possible  structure  of  the  program  running  in  node  A  that  uses 
recovery  blocks.  A  remote  procedure  call  is  used  as  a  means  fet  making  remote  requeete.  Since  when  such  a 
call  i«  made,  the  calling  process  must  wait  for  a  response,  iu  order  to  be  able  to  proceea  requests  Sf.B  rnd 
Sj.C  in  parallel,  it  is  necessary  to  make  the  respective  calls  in  different  processes  in  node  A.  In  the  given 
example,  this  is  indicated  by  the  enclosing  parbegin/parend  structure  although  processes  could  be  forked  in 
a  more  general  manner,  for  example,  just  before  a  remote  call  is  mada.  It  is  assumed  that  a  timeout  is  asso¬ 
ciated  with  each  remote  call;  if  the  timeout  expiree,  the  ctll  terminates  by  signalling  an  exception.  This 
exception  and  any  other  abnormal  return,  if  not  handled  within  the  enclosing  block,  will  result  in  a  switch 
into  aa  alternative  program  within  tha  same  recovery  block.  If  ell  alternatives  fail,  failure  is  signalled 
to  the  next  enclosing  block,  which,  in  this  cese,  is  the  topmost  level.  Since  no  alternative  is  specified  at 
this  level,  the  whole  computation  would  be  aborted. 

According  to  the  semantics  of  recovery  blocks,  before  an  Alternative  can  be  tried  at  any  level,  it  is  neces¬ 
sary  to  return  to  the  initial  state  of  the  recovery  block,  that  is,  undo  what  has  been  done  by  the  failed 
alternative.  Considering  that  it  is  also  necessary  to  coordinate  accesses  to  shared  resource*  from  different 
recovery  blocks  executed  in  different  processes,  etch  alternative  of  a  recovery  block  should  be,  in  fact,  a 
separate  atomic  action.  Figure  6  shows  the  new  tree  of  consult  records  for  the  same  execution  state  aa  the 
one  that  was  depicted  in  Figure  2  :  additional  consult  records  were  added  for  the  recovery  blocks  that  enclose 
the  individual  remote  calls. 

When  a  remote  call  fails,  then  in  order  to  abandon  that  particular  branch,  the  alternative  from  jhich  the 
call  was  made  is  abandoned  also,  and  its  commit  record  is  net  to  “aborted".  When  another  alternative  is  tried, 
a  new  commit  record  ia  created  for  it.  Thus  even  though  the  remote  request  might  be  finished  later  (in  cese 
that  the  remote  cull  failed  because  of  a  timeout,  after  possibly  several  retries),  its  results  can  never 
become  erroneously  committed.  This  meek.'*  that  it  it  not  necessary  to  kasp  the  list  of  current  dependent  atomic 
actions  in  the  commit  record,  as  proposed  in  the  preceding  section.  Or,  viawed  diffarently,  this  list  now 
consists  of  the  commit:  records  of  the  current  alternatives. 

Let  us  return  now  to  the  question  of  what  hat  to  be  done  if,  after  a  timeout,  tha  remote  call  is  retried. 

It  might  seem  that  this  is  the  ease  problem  as  if  the  request  was  resent  as  part  of  recovery  from  a  crash, 
but  the  eituation  here  ia  a  little  bit  more  complicated.  At  this  level,  whether  or  not  to  retry  a  request  is 
the  decisior  of  the  programmer.  If  it  is  the  programmer  who  in  order  to  send  a  request  to  another  node  has 
to  write  the  individual  step*  of  th«  program  PI  outlined  in  the  preceding  section,  then  the  request  can  be 
resent  in  the  following  way  : 

P2  :  o.  set  retry  ;■  n 

i.  get  new  ID 

ii.  create  a  checkpoint  which  include*  a  reference  to  the  commit  record*)  of  the  invoker  anti 
the  message  to  be  sent. 

iii.  send  the  message  which  includes  ID  aDd  a  reference  to  the  coanit  record  of  the  invoker 

iv.  on  timeout  :  retry  :•  retry  -  ) 

if  retry  2  0,  repeat  step  iii 
else  failure 

The  request  will  bu  retried  up  to  n  times,  etch  time  vith  the  ams*  ID:  thus  this  ia  indeed  the  seme  problem 
as  if  the  request  is  retried  after  a  crash.  On  the  othsr  hand,  tC*  programmer  could  be  given  a  primitive 
“remote-call"  which  ccneista  of  the  steps  i  to  iii  of  P2.  Iu  order  to  ratry  a  request,  it  is  necessary  to 
repeat  the  call  : 


P3  s 

0. 

set  retry  : -  n 

1. 

remote. call  (service, 

node,  parameters) 

2. 

on  timeout  :  retry  :■ 

retry  -  l 

if  retry  z  0,  repent  step  1 

Each  time  the  remote-call  is  repeated,  a  new  ID  is  generated  ;  thus  f  the  receivU.^  node,  a  repeated  call 
looks  like  a  new  request.  This  means  that  the  effects  '  tha  previous  try,  if  tha  request  was  indeed  received 
and  executed,  must  be  undone.  Thus,  in  connection  with  .ecowery  blocks,  tha  whole  alternative  of  the  recovery 
block  that  contains  the  call  should  be  repeated.  A  more  graceful  solution  is  to  provide  a  remote-call  primi¬ 
tive  that  includes  the  option  of  an  automic  retry,  that  is,  in  it*  implementation  it  includes  the  steps  o  and 
iv  of  F2.  Thus  the  language  should  provide  a  primitive  remote-call  (service,  node,  parameters,  n)  where  n  is 


*)  The  commit  record  still  must  be  Included  in  the  checkpoint,  since  it  ia  part  of  the  state  of  a  coerpute- 
ticn. 


the  number  of  retri.ee  desired.*^ 

Many  arguments  have  been  raised  recently  with  respect  to  the  basic  communication  primitives  for  a  dis¬ 
tributed  system,  the  primary  aspect  being  the  choice  between  remote  procedure  calls  and  more  general 
send  and  receive  primitives  fLISK  79,  LAUR  79].  Although  in  order  to  achieve  desired  concurrency  a 
separate  process  has  to  be  forked  for  a  remote  call,  this  combination  seems  to  provide  a  cleaner  struc¬ 
ture,  particularly  from  the  point  of  view  of  recoverability.  The  same  effect  could  be  of  course,  achieved 
with  two  separate  send  and  receive  primitives,  but  if  the  send  and  receive  parts  of  different  requests  are 
interleaved,  it  will  be  more  difficult  to  determine  the  proper  recovery  structure.  It  should  be  noted  that 
in  the  context  of  the  recovery  model  presented  here,  a  remote  node  must  reply  to  the  requestor  even  if  no 
data  is  sent  in  the  reply;  thus  having  a  simpler  send  primitive  that  does  not  wait  for  a  response  does  not 
provide  any  advantage.  However,  both  the  recovery  model  and  the  conmunication  primitives  require  further 
study. 

CONCLUSION 

The  concept  of  an  atomic  action  as  a  general  mechanism  for  controlling  recovery  in  computer  systems  and 
particularly  i.n  distributed  systems  is  gaining  more  and  more  acceptance.  Oi  course,  there  is  always  the 
problem  of  cost.  The  heavy  use  of  stable  storage  and  the  extra  messages  needed  to  test  dependencies  of 
nesced  atomic  actions  and  to  coordinate  their  commitment  or  abortion  can  be  very  expensive.  However,  if  a 
very  reliable  system  is  needed,  alternative  mechanisms  mig'  ‘  be  equally  expensive.  Atomic  actions  have 
some  strong  advantages.  They  provide  a  uniform  scheme  for  coping  with  either  local  or  remote  failures. 
Nested  atomic  actions  support  naturally  conmon  programming  techniques.  What  is  needed  is  more  of  experi¬ 
mental  work  that  uses  these  concepts  to  demonstrate  that  it  is  indeed  feasible  to  built  in  this  way  not 
just  a  very  reliable  but  also  a  practical  system. 


REFERENCES 
CUR  80 

ESWA  76 

GRAY  80 

LAMP  79 

UUR  78 

LIND  79 

LISK  79 

MERL  77 

RAND  75 

REED  78 

SHRI  81 

SVOB  79A 

SVOB  79B 

SVOB  80 

TRAI  79 

WOOD  80 


Clark,  D.D.,  Svobodova,  L. ,  "Design  of  Distributed  Systems  Supporting  Local  Autonomy", 

Digest  of  Papers,  CQMPCON  Spring  '80,  San  Francisco,  California,  February  1980,  pp.  438-444. 

Eswaran,  K, ,  et  al.,  "The  Notions  of  Consistency  and  Predicate  Locks  in  a  Database  System", 
Coasn.  of  ACM,  Vol.  19,  N°  11  (November  1976),  pp.  624-633. 

Gray,  J.,  "A  Transaction  Model",  Lecture  Notes  in  Computer  Science,  Springer-Verlag,  Vol.  65, 
July  1980,  pp.  282-298. 

Lampoon,  B.,  Sturgis,  H. ,  "Crash  Recovery  in  a  Distributed  Data  Storage  System",  XEROX  PARC, 
Palo  Alto,  California,  1979  (to  appear  in  Comm,  of  ACM) . 

Lauer,  H.C.,  Needham,  R.M. ,  "On  Duality  of  Operating  System  Structures",  Proc.  of  Sc-ond  ter- 
national  Symposium  on  Operating  Systems,  IPIA,  Rocquen.ourt ,  France,  October  1978. 

Lindsay,  B.G.  et  al.,  "Notes  on  Distributed  Database:  IBM  Research  Laboratory  Technical 

Report  N°  RJ2571,  San  Jose,  California,  July  1979. 

Liskov,  B.,  "Primitives  for  Distributed  Computing",  Proc,  ot  7th  ACM  Symposium  on  Operating 
Systems  Principles,  December  1979,  pp.  33-42 

Merlin,  P.M. ,  Randell,  3.,  "Consistent  state  Restore  :ion  in  Distributed  Systems",  Technical 
Report  N°  113,  University  of  Newcastle  upon  Tyne,  Newcastle  upon  Tyne,  England,  October  1977. 

Randell,  B. ,  "System  Structure  for  Software  Fault  Tolerance",  IEEE  Transactions  on  Software 
Engineering,  Vol.  St-1,  N’  2  (June  1975),  pp.  220-232.  '  ” 

Reed,  D.P.,  "Naming  and  Synchronisation  in  a  Decentralized  Computer  System",  MIT  Laboratory 
for  Computer  Science  Technical  Report  205,  Cambridge,  Massachusetts,  September  1978. 

Shrivastava,  S.K.,  "Structured  Distributed  Systems  for  Recoverability  and  Crash  Resistence", 
IEEE  Transactions  on  Software  Engineering,  July  1981  (to  appear). 

Jvobodova,  L.,  Liskov,  B. ,  Clark,  D.,  "Distributed  Computer  Systems  :  Structure  end  Semantics", 
MIT  Laboratory  for  Computer  Science,  Technical  Report  N*  TR-215,  Cambridge,  Massachusetts, 

March  1979. 

Svobodova,  L.,  "Reliability  Issues  in  Distributed  Information  Processing  Systems”,  Proc.  of  the 
Ninth  IEEE  Fault  tolerant  Computing  Symposium,  June  1979,  pp.  9-16. 

Svoboduva,  L.,  "Management  of  Object  Histories  in  the  SWALLOW  Repository",  MIT  Laboratory  for 
Computer  Science  Technical  Report  243,  Cambridge,  Massachusetts,  July  1980. 

Traiger,  I.L.,  el  al.,  "Transactions  and  Consistency  in  Distributed  Database  Systems",  IBM 
Research  Laboratory  Technical  Report  KJ  2555,  San  Josa,  California,  June  1979. 

Wood,  W.G.,  "Recovery  Control  of  Communicating  Processea  in  a  Distributed  System',  University 
of  Newcastle  upon  Tyne,  Technical  Report  N*  158,  Newcastle  upon  Tyne,  G.9.,  November  1 <«0. 


*)  Note  that  if  it  is  indeed  desired  to  start  a  new  atomic  action  on  a  retry,  it  ia  still  possible  to  use 
the  sequence  P3,  where  the  last  parameter  in  the  remote-call  ia  aat  to  0. 


Flaure  1:  Example  of  s  distributed  computation 


FI pure  2:  State  of  the  object  X  and  of  the  commit  records 
o.'  the  enclosing  atomic  actions  before  the 
termination  of  the  request  S,.E 


! 


Figure  3:  Situation  during  the  execution  of  the  request  S^.E 


Flgur  •;  4:  Same  state  of  execution  as  In  Figure  2;  .dentlflers 
of  atomic  actions  were  added  for  crash  recovery 


i 


rA:  ensure  A. test 

b£  AO:  Parbeqln 


_  AB:  ensure  AB.test 

AB1 :  begin 

remote_cal 1 (S^ ,B .parameters) ; 
end 

else  by  AB2:  begin 

remote_call (SJ ,B' .parameters) ; 
end 

^  else  error 


^  AC:  ensure  AC. test 

b^  AC1 :  begin 

rwiote_cal  1(S^,C  .parameters) ; 

end 

else  by  ... 

v"  else  error 
parend 
else  error 


Figure  5:  Structure  of  the  program  executed  at  node  A  that 
uses  recovery  blocks 


r  * 
| 


Figure  6:  Same  state  of  execution  as  In  Figure  2;  each  remote 
request  is  made  from  a  separate  recovery  block 


ik  iMai 


19-1 


GENERALIZED  POLL I HR  ALGORITHMS  COR  DISTRIBUTED  SYSTEMS 
Jack  Kell  Wolf 

Department  of  Electrical  ar.d  Computer  Engineering 
University  of  Massachusetts 
Amherst,  Massachusetts  01003  USA 


ABSTRACT 

A  polling  algorithm  for  a  distributed  system  Is  an  algorithm  which  can  be  simultaneously  run  at  all  term¬ 
inals  In  a  network  and  which  has  as  Its  aim  the  oeterml nation  of  which  teimlnals  have  a  positive  response 
to  a  specific  query.  Of  particular  Interest  Is  the  situation  where  one  expects  very  few  of  the  terminals 
to  respond  positively  and  where  a  terminal  signifies  a  negative  response  by  not  transmitting  at  all.  In 
such  a  case  It  Is  Inefficient  to  poll  the  terminals  In  a  round-robin  manner.  A  more  efficient  procedure 
Is  to  group  the  terminals  into  subsets  In  which  all  terminals  In  a  subset  are  queried  simultaneously. 

Then  If  all  respond  negatively  no  further  queries  need  be  addressed  to  that  subset.  If  the  responses  from 
the  terminals  In  the  subset  are  mixed  than  this  subset  Is  further  subdivided  Into  smaller  subsets  until 
the  responses  of  all  the  terminals  are  determined. 

In  this  paper  two  distinct  algorithms  for  polling  are  considered.  In  both  algorithms,  the  terminals  of  the 
network  are  represented  by  leaves  In  a  binary  tree  and  the  subsets  are  subtrees  In  the  overall  tree.  The 
two  systems  differ  In  the  assumptions  made  regarding  the  types  of  responses  sent  and  how  the  responses  tre 
interpreted.  The  performance  of  these  two  schemes  are  compared  with  each  other  and  with  ordinary  round- 
robin  polllnq. 

1.  INTRODUCTION 

Consider  a  set  of  cormunl cation  terminals  (or  nodes)  which  communicate  over  a  common  comnunl cation  channel 
and  for  which  every  terminal  can  reliably  receive  the  transmission  of  every  other  terminal.  Suppose  that 
a  query  Is  to  be  made  of  all  terminals  In  the  network  and  that  it  Is  desirable  for  every  terminal  to  know 
the  yes/no  response  of  every  other  terminal  to  the  particular  query.  Furthermore  assume  that  because  of 
reliability  considerations  it  is  undesirable  to  use  a  centralized  algorithm  at  one  terminal  to  conduct 
this  query  but  rather  a  distributed  algorithm  which  Is  simultaneously  run  at  all  the  terminals  must  be  used. 
Finally,  assume  that  the  network  Is  in  synchronism  and  that  all  terminals  know  of  the  response  of  all  other 
terminals. 

The  most  stralqhtforward  method  of  accomplishing  this  task  is  via  a  round-robin  polling  technique  whereby 
all  of  the  terminals  respond  to  the  query  In  some  pre-determined  order  using  a  time-division  multiplexing 
technique.  If  we  have  N  terminals  we  would  require  N  time  slots,  one  dedicated  to  each  terminal.  The 
time  slot  must  be  of  sufficient  duration  to  carry  the  response  of  a  terminal.  We  now  mak?  two  assumptions, 
the  result  of  which  Is  to  render  the  round-robin  technique  inefficient.  We  first  assume  that  a  terminal 
indicates  a  neqatlve  response  to  a  query  by  transmitting  nothing  at  all.  This  method  of  indicating  a  nega¬ 
tive  response  Is  quite  common  in  a  network,  especially  when  radio  silence  Is  important.  The  second  assump¬ 
tion  is  that  very  few  of  the  terminals  will  respond  positively  to  the  query.  Our  aim  here  Is  to  Investi¬ 
gate  alternative  distributive  schemes  which  are  more  efficient  then  the  round-robin  scheme  when  then  two 
assumDtlons  hold. 

The  basic  approach  Is  to  break  the  set  of  terminals  into  subsets  and  to  query  simultaneously  all  terminals 
In  a  subset.  Then  if  no  transmission  is  received  from  any  terminal  in  the  subset,  all  terminals  In  that 
subset  are  known  to  have  responded  negatively.  If,  however,  one  or  more  positive  responses  are  received, 
further  queries  of  the  terminals  In  that  subset  In  qeneral  are  required.  The  querying  Is  done  by  further 
subdividing  the  terminals  in  that  subset  Into  smaller  subsets.  All  terminals  in  the  network  are  able  to 
know  which  subset  is  being  queried  at  ary  time  since  they  all  receive  all  responses  and  can  use  these  re¬ 
sponses  to  drive  a  common  algorithm  wnich  prescribes  exactly  which  terminals  are  being  queried.  Thus  no 
actual  questions  need  be  transmitted.  Only  the  answers  to  the  implied  questions  are  transmitted  over  the 
communications  channel. 

Two  different  algorithms  are  explored  In  this  paper.  The  first  algorithm  was  originally  suggested  by  Hayes 
(Hayes,  O.F...,  1978)  and  assumes  that  a  terminal  which  desires  to  respond  positively  to  a  query  transmits 
energy  over  the  channel .  If  a  group  of  terml.i'ls  Is  simultaneously  queried  and  energy  appears  on  the 
channel  in  the  slot  allocated  to  the  response,  then  all  terminals  know  that  at  least  one  of  the  terminals 
in  the  subset  quelled  answered  affirmatively  to  the  query.  The  details  of  this  algorithm  are  described  in 
the  next  section  (Section  2)  along  with  a  sketch  of  the  analysis  of  this  algorithm. 

The  original  Hayes  algorithm  asks  some  questions  of  groups  of  terminals,  the  answer  to  which  could  have 
been  predicted  before  the  questions  were  asked.  These  redundant  questions  can  be  skipped  without  any  loss 
In  performance.  The  subsequent  section  (Section  3)  details  a  modification  to  the  nayes  algorithm  achieved 
by  skipping  redundant  queries  and  an  analysis  of  the  Improved  algorithm. 

The  next  section  (Section  A)  describes  a  new  algorithm  (Gudjohnsen,  E.  et  al...,  1980)  for  which  fewer 
queries  are  required  but  for  which  more  complicated  answers  nre  reoulred.  Now  each  terminal  in  the  net¬ 
work  Is  qlven  a  unique  signature  (or  address)  and  If  the  terminal  ishes  to  respond  affirmatively  It  trans¬ 
mits  Its  signature  in  the  appropriate  time  slot.  Now  If  a  subset  of  the  terminals  is  queried,  if  none  or 
one  of  the  terminals  responds  positively,  the  status  of  all  terminals  in  the  subset  can  be  determined. 

(If  one  responds,  the  identity  of  that  one  can  be  determined  by  reading  Its  signature— all  others  have  a 
neqatlve  response.)  Furthermore  if  two  or  more  terminals  simultaneously  respond  we  assume  that  the  sig¬ 
natures  of  all  transmissions  are  garbled  but  that  all  receivers  recognize  that  a  garbled  set  of  signatures 
was  received  so  that  they  know  there  were  two  or  more  positive  responses  in  the  subset.  In  such  a  case.  If 
the  subset  contains  more  than  two  terminals,  a  further  subdivision  is  required.  If  the  subset  contains 
exactly  two  terminals  no  further  subdivision  is  required  since  t'-’  garbled  response  must  have  been  the  re¬ 
sult  of  both  terminals  transmitting  their  signatures.  Various  analyses  are  performed  for  this  system. 


19-2 


First  the  average  number  of  responses  Is  calculated.  Then  the  average  number  of  bits  In  these  responses 
Is  calculated  using  two  different  approaches. 

In  the  sections  to  follow  we  will  make  the  following  common  assumptions: 

L 

(1)  The  number  of  terminals  h  Is  a  power  of  2:  l.e.,  N  «  2  .  Thus  where  signatures  are  assigned, 
each  signature  Is  k  bits  long. 

(2)  For  every  terminal,  the  probability  that  the  terminal  wishes  to  respond  positively  Is  given  by 
the  paramater  p,  0  <  p  <  1.  (Note  that  p  Is  assumed  the  same  for  each  terminal.) 

(3)  The  random  variables  describing  the  responses  of  all  N  terminals  to  any  query  are  statistically 
Independent.  Thus,  the  probability  that  exactly  1  of  the  N  terminals  wish  to  respond  positively  to  a  query 
Is  given  by  the  formula 

(J)  P1  (l-p)1*1'1  for  1  -  0,1,2 . N, 

To  Illustrate  the  steps  followed  In  each  of  the  algorithms  we  consider  the  following  common  example.  Assume 
there  are  16  terminals  denoted  (0,1,2 . 15).  To  a  particular  query,  terminals  1,  10  and  11  wish  to  re¬ 

spond  positively  and  all  other  terminals  choose  to  respond  negatively  by  preserving  radio  silence.  For 
convenience,  we  show  In  Figure  1  all  16  terminals  as  the  leaf  nodes  of  a  binary  tree,  litese  nodes  are 
Identified  by  the  symbols,  0,1,... ,15  while  the  Internal  nodes  are  Identified  by  the  letters  A,  B,  ....  Q 
(with  I  and  0  omitted  to  avoid  confusion  with  the  Integers  1  and  0).  The  asterisks  next  to  leaf  nodes  1, 

10  and  11  Indicate  that  they  respond  positively.  All  other  leaf  nodes  respond  negatively. 

2.  THE  HAYES  ALGORITHM  (Hayes,  J.F...,  1978) 

Hayes  described  two  different  versions  of  his  algorithm  which  he  termed  non-adaptlve  and  adaptive.  Me 
begin  with  a  discussion  of  the  non-adaptlve  version,  since  although  the  adaptive  version  Is  Important  from 
a  practical  standpoint,  Its  understanding  follows  easily  from  the  non-adaptlve  case. 

k  k 

As  In  the  example  depicted  In  Figure  1 ,  the  N  *  2  terminals  are  Identified  with  the  2  leaves  of  a  binary 
tree  of  kepth  k.  A  query  Is  initially  made  of  all  the  terminals  by  querying  all  of  the  leaf  nodes  that 
stem  from  the  root  node.  (This  Is  node  A  In  Figure  1.)  If  all  terminals  respond  negatively  the  algorithm 
Is  complete.  If  at  least  one  of  the  terminals  respond  positively,  chon  a  query  Is  made  of  terminals  which 

stem  from  the  node  whose  leaves  are  those  In  the  upper  half  of  the  tree  (This  Is  the  node  B  In  Figure  1.) 

If  all  terminals  In  this  subset  respond  negatively  the  terminals  corresponding  to  leaf  nodes  In  the  lower 
half  of  the  tree  are  then  queried.  (This  Is  node  f  In  Figure  1  )  Whenever  a  query  of  2*  leaf  nodes 

U  >  1)  produces  a  positive  response,  the  2*  nodes  are  subdivided  Into  two  sets  of  2*-l  nodes  and  each  Is 

queried  separately.  Toe  process  Is  Iterated  until,  finally.  Individual  leaf  nodes  are  queried  and  the  re¬ 
sponses  of  all  terminals  are  determined. 


0 

1* 

2 

3 

4 

5 

6 

7 

8 
9 

10* 

11* 

12 

13 

14 

15 


Figure  1.  A  common  example  for  all  algorithms— 16  terminals  with  terminals 
1,  10  and  11  responding  positively. 


19-3 


Since  the  algorithm  Is  Known  to  all  terminals,  and  since  th«  responses  to  the  queries  are  available  to  all 
terminals,  no  questions  need  to  be  asked.  Rather  the  terminals  respond  to  the  next  Implicit  question  In 
the  algorithm  without  any  time  (or  bits)  being  wasted  by  actually  asking  the  questions.  The  response  to 
each  Implicit  query  only  Involves  the  terminals  queried  sending  one  bit  of  Information. 

To  Illustrate  this  algorithm  consider  the  examples  of  the  1C  terminal  network  numbered  (0,1,... ,15)  shown 
In  Figure  1  where  terminals  I,  10  and  11  wish  to  respond  positively  and  all  other  terminals  wish  to  respond 
negatively.  For  erch  Implicit  question  In  the  algorithm,  the  following  table  contains  the  node  In  the  tree 
from  which  the  subtree  grows,  the  leaf  nodes  (or  terminals)  which  are  being  queried  on  each  question,  and 
the  response*  which  appears  on  the  channel  (yes  or  no). 

Table  1 


Queries 

and  Responses  for  Example 

Given  In  Table  1  Using  Haves  Algorithm 

Question  No. 

Node  In  Tree 

Terminals  Being  Queried 

Response 

1 

A 

all 

yes 

2 

B 

_  .  0,1, 2,3, 4, 5,6, 7  . 

.yes 

3 

D 

_  0. 1.2.3  _  . 

_yes 

4 

H 

0.1 

yes 

5 

0 

0 

no 

6 

1 

1 

yes 

7 

J 

..  2.3 . 

no 

8 

E 

4.5.6 ,7 . .. 

no 

9 

C 

8.9.10.11,12.13.14,15 

yes 

10 

F 

d, 9. 10, 11 

yes 

11 

M 

8.9 

no 

12 

N 

10,11 

yes _ 

13 

10 

10 

yes 

14 

11 

11 

yes 

15 

G 

12,13,14.15 

no 

In  this  example,  15  queries  were  required  to  determine  the  responses  of  all  16  nodes.  Round-robin  polling 
would  have  required  16  queries  so  the  savings  here  were  not  Impressive.  In  order  to  determine  what  savings 
(If  any)  this  algorithm  achieves  over  round-robin  polling,  one  requires  either  a  mathematical  analyses  of 
the  algorithm  or  a  simulation.  Fortunately,  this  algorithm  admits  to  a  neat  mathematical  analysis,  the 
details  of  which  are  given  In  the  Appendix.  The  end  result  Is  a  recursive  formula  which  gives  the  average 
number  of  queries  required  for  a  network  of  2k  terminals,  E[Qk(p)],  in  terms  of  the  average  number  of  queries 
for  a  terminal  with  2k_1  terminals,  E[Qk_^(p)].  This  formula  Is 

E[Qk(p)]  =  2ECQk_1(p)]  +  l-2(l-p)2k  k  >  1 

The  initial  condition  to  begin  this  recursive  calculation  is  E[Q0(p)]  »  1  since  it  requires  only  one  query 
to  poll  a  network  with  a  single  terminal  Irrespective  of  the  value  of  p.  Numeric*1  results  for  E[Qk(p)] 
for  k  =  1  to  10  and  p  =  .1  to  .5  In  steps  of  .1  are  given  In  Table  II. 


Table  II 

Unmodified  Haves 

p 

k 

E[Qk(p)] 

S[Qk(p)/2' 

,i 

1 

1.380 

.690 

i 

2 

2.488 

.612 

.i 

3 

5.035 

.629 

.1 

4 

10.699 

.669 

.1 

5 

22.329 

.698 

.1 

6 

45.655 

.713 

.1 

7 

92.310 

.721 

.1 

8 

185.621 

.725 

.i 

9 

372.247 

.727 

,i 

10 

745.483 

.728 

.2 

1 

1 .720 

.860 

.2 

2 

3.621 

.905 

.2 

3 

7.906 

.988 

19-4 


Table  II  (cont.) 


p 

k 

EtQk(p)] 

F.[Qk(p)/2k] 

.2 

4 

15.756 

1.047 

.2 

5 

34.510 

1.078 

.2 

6 

70 .020 

1.094 

.2 

7 

141 .040 

1.102 

.2 

8 

283 .080 

1.106 

.2 

9 

567.161 

1.108 

.2 

10 

1135.322 

1.109 

.3 

1 

2.020 

1.010 

.3 

2 

4.560 

1.140 

.3 

3 

10.004 

1.251 

.3 

4 

21 .002 

1.313 

.3 

5 

43.008 

1.344 

.3 

6 

87 .008 

1.359 

.3 

1 

175.016 

1,367 

.3 

8 

351.031 

1.371 

.3 

9 

703.062 

1.373 

.3 

10 

1407.125 

1.374 

.4 

1 

2.280 

1.140 

.4 

2 

5.301 

1.325 

.4 

3 

11.568 

1.446 

.4 

4 

24.135 

1.508 

.4 

5 

49.271 

1 .540 

.4 

6 

99.542 

1 .555 

.4 

7 

200.084 

1.563 

.4 

8 

401.167 

1.567 

.4 

9 

803.334 

1.569 

.4 

10 

1607.669 

1 .570 

.5 

1 

2.500 

1.250 

.5 

2 

5.875 

1 .469 

.5 

3 

12.742 

1.593 

.5 

4 

26.484 

1.655 

.5 

5 

53.969 

1.687 

.5 

6 

108.937 

1.702 

.5 

7 

218.875 

1.710 

.5 

8 

438.750 

1.714 

.5 

9 

878.499 

1.716 

.5 

10 

1757.998 

1.717 

'j 


This  completes  the  discussion  of  the  original  Hayes  algorithm  for  the  non-adaptlve  case.  The  essence  ef 
the  adaptive  case  Is  the  notion  that  for  certain  values  of  p  It  may  be  advantageous  to  treat  Z*  terminals 
as  two  distinct  sets  of  2*~ '  terminals  (or  four  distinct  sets  of  2*-2  terminals,  etc.)  which  are  to  be 
polled  separately.  In  order  to  determine  the  optimum  partitioning  of  the  set  of  terminals  we  compute  for 
each  value  of  k  and  p  the  quantity  ElQ}e(p)3/ 2k .  For  a  given  value  of  p,  we  then  denote  by  k*(p),  the  value 
(jf  k  for  which  E[Q|<(p)]/2k  Is  a  minimum.  For  a  network  with  2*  terminals,  let  koPT*^  **  >1.  Otherwise 


"OPT 


k 

k* 


If  k  <  k* 
If  k  >  k*. 


k~k0PT  ^OPT 

groups,  each  containing  2  nodes  and  each  group 


Then  one  should  partition  the  2  terminals  Into  2 
should  be  polled  separately  using  the  non-adaptlve  algorithm.  The  average  number  of  queries  required  Is 


then  2 


k-k 


OPT 


E(0„  (p)>.  If  k*(p)  ■  r  the  adaptive  algorithm  reduces  to  a  round-robin  algorithm.  The 

K0PTk 

values  of  E[Qk(p)]/2  are  also  contained  In  Table  II. 

For  example  In  Figure  1,  If  the  16  terminals  were  treated  as  4  sets  containing  4  terminals  each,  only  12 
queries  would  be  required.  Similarly,  12  queries  would  be  required  If  the  16  terminals  were  treated  as  8 
sets  containing  2  terminals  eajh. 


3. 


THE  MODIFIED  HAYES  SCHEME 


The  astute  reader  will  have  noticed  that  the  Hayes  algorithm,  as  described  In  the  previous  section,  asks 
some  questions  to  which  the  answers  could  have  been  predicted  with  certainty.  Specifically,  If  21”  leaf 
nodes  are  polled  (m  >  1),  and  a  response  Is  obtained,  yet  no  response  Is  obtained  when  the  first  half  of 
the  terminals  are  queried,  It  Is  certain  that  a  positive  response  will  be  obtained  when  the  second  half 
are  queried.  Thus,  these  questions  can  be  omitted  from  the  algorithm  with  no  loss  of  performance.  (This 
latter  statement  assumes  that  all  terminals  reliably  receive  all  responses.  If  errors  can  occur,  these 
redundant  queries  and  responses  stabilize  the  algorithm.)  The  modified  Hayes  algorithm  suggested  here  Is 
thus  to  omit  unnecessary  questions.  This  modification  can  be  used  with  either  the  non-adaptlve  or  adaptive 
scheme. 

For  example  of  Figure  1  and  Table  I,  using  the  non-adaptlve  scheme,  queries  6  and  12  can  be  omitted  since 
the  answers  to  these  questions  are  certainly  "yes".  Thus  the  number  of  queries,  for  this  example,  using 
the  modified  version  of  the  non-adaptlve  scheme  would  be  13  Instead  of  15.  It  Is  left  to  the  reader  to 
count  the  queries  for  the  modified  version  of  the  adaptive  scheme. 


It  Is  desirable  to  know  the  average  number  of  queries  (or  responses)  required  with  the  modified  Hayes  algor¬ 
ithm  to  poll  2*  terminals,  each  of  which  has  a  probability  p  of  answering  yes.  Call  ing  this  quantity 
E[Q(c ( P) D  the  following  recursive  formula  can  be  derived: 

k-1  k 

ECQk(P)]  ■  2E(Q^_i ( p) ]  +  I  -  (1-p)2  -  (l-p)Z  .  k  >  1. 

Again  the  Initial  condition  Is  E[Q ' ( p) ]  *  1.  Table  III  gives  numerics*'  results  for  the  average  nunber  of 
queries  as  well  as  the  Information0 required  In  order  to  determine  the  best  adaptive  scheme. 

Table  III 


Modified  Haves 
E[Q‘(p)] 


E[Q^(p)]/2k 


.1 

1 

1.290 

.645 

.1 

2 

2.114 

.528 

.1 

3 

4.141 

.518 

.1 

4 

8.667 

.542 

.1 

5 

18.114 

.566 

.1 

6 

37.192 

.581 

.1 

7 

75.383 

589 

.1 

8 

151.766 

.5)3 

.1 

9 

304.531 

.593 

.1 

10 

610.062 

.596 

.2 

1 

1.560 

.780 

.2 

2 

3.070 

.768 

.2 

3 

6.563 

.820 

.2 

4 

13.931 

.871 

.2 

5 

28.833 

.901 

.2 

6 

58.665 

.917 

.2 

7 

118.330 

.924 

.2 

8 

237.660 

.928 

.2 

9 

476.321 

.930 

.2 

10 

953.641 

.931 

.3 

1 

1.810 

.905 

.3 

2 

3.890 

.972 

.3 

3 

8.482 

1.060 

.3 

4 

17.903 

1 .119 

.3 

5 

36.803 

1.150 

.3 

6 

74.606 

1 .166 

.3 

7 

150.212 

1  .174 

.3 

8 

301.4  23 

1.177 

.3 

9 

603.84/ 

1.179 

.3 

10 

1208.694 

1.180 

.4 

1 

2.040 

i  .020 

.4 

2 

4.590 

1.148 

.4 

3 

10.034 

1.254 

.4 

4 

21.052 

1.316 

.4 

5 

43.103 

1  .347 

.4 

6 

87.206 

1 .363 

.4 

7 

175.413 

1.370 

.4 

8 

351.825 

1 .374 

.4 

9 

704.651 

1  .376 

.4 

10 

1410.302 

1.377 

.5 

1 

2.250 

1 .125 

.5 

2 

5.188 

1  .297 

.5 

3 

11.309 

1  .414 

.5 

4 

23.613 

1  .476 

.5 

5 

48.227 

1  .507 

.5 

6 

97.453 

1  .523 

.5 

7 

195.906 

1 .531 

.5 

8 

392.812 

1 .534 

.5 

9 

786.624 

1 .536 

.5 

10 

1574.249 

1 .537 

4.  NEW  ALGORITHM  (Gudjohnsen,  E.  et  al....  1980) 

In  this  section  we  describe  a  new  algorithm  which,  In  general,  requires  fewer  queries  than  the  Hayes  algor¬ 
ithm  (even  when  modified)  but  requires  more  bits  In  each  response.  Several  methods  of  evaluating  the  per¬ 
formance  of  this  algorithm  will  be  described.  For  each  method,  we  will  compare  the  results  of  this  analysis 
with  similar  results  for  the  Hayes  algorithm.  This  algorithm  Is  based  upon  a  method  originally  put  forth  by 
Capetanakls  (Capetanakls,  J.I...,  1979)  for  random  access.  We  differ  from  Capetanakls  In  that  he  was  con¬ 
cerned  with  the  terminals  sending  a  multi-bit  message  whereas  we  are  concerned  with  the  terminals  only  re¬ 
porting  their  response  to  a  single  yes/no  question.  Furthermore,  the  focus  of  Capetanakls 's  work  was  on  the 
situation  with  an  Infinite  number  of  users  whereas  we  are  concerned  with  the  case  of  a  finite  number  of 
terminals. 


!  9-6 


In  this  algorithm  each  of  the  Zk  terminals  Is  assigned  ».  unique  k  bit  signature.  If  the  terminal  wishes 
to  respond  positively  It  emits  Its  signature.  Again  sunsets  of  terminals  are  queried.  If  no  signatures 
are  Imposed  on  the  channel  In  response  to  a  query  of  a  subset  of  terminals,  all  terminals  know  that  the 
response  of  all  queried  terminals  In  that  subset  Is  known.  It  Is  only  when  two  or  more  Impose  their  sig¬ 
nature  in  response  to  a  query  tMt  further  queries  are  required.  The  exception  to  this  latter  statement 
is  If  only  two  terminals  are  qr.rled.  Then  no  matter  what  the  response,  the  responses  of  these  terminals 
are  known  by  all . 

Again  an  adaptive  and  non-adaptlve  version  of  this  algorithm  can  be  envisioned.  We  describe  the  non-adap- 
tive  version  first.  A  query  Is  Initially  asked  of  all  2fc  terminals  In  the  network.  If  none  of  the  ter¬ 
minals  respond  or  one  cf  the  terminals  respond  by  transmitting  Its  k  bit  signature  the  algorithm  is  com¬ 
plete.  The  algorithm  is  also  complete  If  k  =  1  irrespective  of  the  response.  If,  however,  for  k  >_  2,  two 
or  more  terminals  respond  by  transmitting  their  signatures,  the  2*  terminals  are  subdivided  into  2  subsets 
containing  2*"1  terminals  ea<"h  and  the  process  Is  repeated  for  each  of  the  subsets  until  all  of  the  re¬ 
sponses  are  known.  Questions  which  provide  no  new  Information  are  skipped  just  as  in  the  modified  Hayes 
algorithm. 

This  algorithm  again  can  be  thought  of  in  terms  of  querying  leaf  nodes  of  a  binary  tret  stemming  from  given 
internal  nodes  of  the  tree.  The  queries  and  responses  for  the  example  given  In  Figure  1  when  this  algorithm 
Is  employed  are  given  In  Table  IV.  The  signature  of  the  ith  user  is  assumed  to  be  the  4  bit  binary  repre¬ 
sentation  of  the  decimal  number  (i.e.,  0  0000,  1  +  0001  .  15  1111).  Furthermore  XXXX  is  used  to  de¬ 

note  the  response  when  two  or  more  signatures  are  transmitted,  and  4  Is  used  to  denote  no  response. 

Table  IV 


Queries  and  Responses  for  E^mpleGlyen  in  Table  1  Using  New  Algorithm 


Question  Nunfrer 


Node  In  Tree 


Terminals  Being  Querl ed 


Response 


1 

A 

all 

XXXX 

2 

B 

0,1-. ...  ,7 

0001 

3 

C 

8,9 . 15 

XXXX 

4 

F 

8,9,10,11 

XXXX 

5 

H 

8,9 

♦ 

6 

G 

12,13,14,15 

♦ 

Note  that,  *fto*'  two  or  more  signatures  were  found  as  a  response  to  query  number  4  and  no  signatures  were 
tound  as  3  i-aspnse  to  query  numher  5.  all  terminals  knew  that  terminals  10  and  11  responded  positively 
(and  8  and  9  -esponded  negatively). 


L 

Let  E[Qj"( p) ]  denote  the  average  number  of  queries  required  by  this  algorithm  to  poll  2  terminals,  each  of 
which  had  probability  p  of  responding  positively.  The  recursive  formula  which  determines  this  quantity  Is 


EtQ^p)]  =  2ECQ^_1(p)]  +  1  -  (1-pK 


for  k  >  2.  The  Initial  condition  here  is  E[Qi(p)]  =  1  since  we  need  exactly  one  query  to  poll  two  terminals 
using  this  algorithm.  Numerical  values  for  E  [Q£(p)]  are  given  in  Table  V. 


Table  V 


New  Scheme—  Coun*  of  Queries 


P 

* 

E[Q^(P)] 

E[Q^(p)]/2k 

.1 

1 

1 

.5 

.1 

2 

1.096 

.274 

.1 

3 

1.532 

.192 

.1 

4 

2.955 

.185 

.1 

5 

6.507 

.203 

.1 

6 

13.967 

.218 

.1 

7 

28.932 

.226 

.1 

8 

58.864 

.230 

.1 

9 

118.728 

.232 

.1 

10 

238.455 

.233 

.2 

1 

1 

.5 

.C 

2 

1.336 

.334 

.2 

3 

2.591 

.324 

.? 

4 

5.818 

.364 

.2 

8 

12.597 

.394 

.2 

6 

26.194 

.409 

.2 

7 

53.387 

.417 

.2 

8 

107.774 

.421 

.2 

9 

216.549 

.423 

.2 

10 

434.097 

.424 

.3 

1 

1 

.5 

.3 

2 

1.652 

.413 

.3 

3 

3.711 

.464 

.3 

4 

8.326 

.520 

IV-7 


.3 

.3 

.3 

.3 

.3 

.3 

.4 

.4 

.4 

.4 

.4 

.4 

.4 

.4 

.4 

.4 

.5 

.5 

.5 

.6 

.5 

.5 

.5 

.5 

.5 

.5 


5 

6 

7 

8 

9 

10 

1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

1 

2 

3 

4 

5 

6 

7 

8 
9 

10 


Table  V  (cont.) 

E[Q“(P)] 

17.649 

36.298 

73.597 

148.194 

297.388 

595.775 

1 

1.992 
*  4.703 
10.385 
21 .769 
44.639 
90.078 
181.156 
353.312 
727.623 

1 

2.313 
6.512 
12.019 
25.038 
51  .077 
103.153 
207.306 
415.613 
832.225 


E[Q^(p)]/2_ 


.552 

.567 

.575 

.579 

.581 

.582 

.5 

.498 

.588 

.649 

.683 

.696 

.704 

.708 

.710 

.711 

.5 

.578 

.689 

.751 

.782 

.798 

.806 

.810 

.812 

.813 


£  thA  nuantl tv  needed  to  determine  the  optimum  petition- 

<:  “ *« • *  &  — * « «"  *•  “  ^ 

follow  exactly  as  for  the  Hayes  algorithm.) 

ftl  though  the  new  algorithm  ■ JSg'JRu  StTl^rKlrJln  th«  IjiSnK. 

responses  for  the  Hayes  algorithm  are  on  y  9  terminals  know  which  subset  Is  being 

In  order  to  compare  bits  In  the  ^ponses  we  note that  si  “  responses.  Only  the  portion  of 

queried,  the  terminals  than  know  part  of  ti\e no nd ers  need  be  transmitted.  For  example,  referring 
the  signature  which  is  not  cownon  P?*®  1d  ^  b  001  ^tead  of  0001  since  the  signatures 

3  K’^rbSiySrSdr  S”  ssu  ss  s*  •  «  °ti,  s  ««s,s,r ■s,bi? 

arirsKhSeTs  ^  rsaR » «.  - — - 

be 


k-1 

E[B.  (p)l  =  2EtBk_T(P^l  +  k  *  ^-P)2  t**1  + 


„k-l 


(5k-3)(2* 


'Vi*p)<!! 


>k-l 


***))  +  klO-p/  ♦ 


2k  pd-p)^’1) 


for  k  >  2  with  initial  condition  C(Bi(p)]  »  l.  J  J^^/ia^birtiw^^finiS  E?bJ?p)1  as*the  average 

s?»  bk&.ims  SrTsr^ir:  rsw*. «  — «*—• 


„k-l 


.k-1 


(3k+D2k‘1P(1-P)^  *^)» 


E[8Mp)]  =  2EtB^  1Cp) 3  +  k  +  (l-p)2  +k  2*P  n-P)(Z  ^  *  3(1 

for  k  >  2  again  with  Initial  condition  EO^p)]  -  1.  Numerical  values  of  these  quantities  are  given  In  Thl  .VI 

Let  us_compare  ne.  ».  U"««  5,”  «?. 

asras  srjv«*  ".“.is  stt1 

2***6  .■sKfl.'Ki!wai.xr»K  «s  sr«rsrwA-.i-wH«i  <* 


Hayes  algorithm. 


Table  VI 

New  Scheme-Bit  Count 

£[Bk(p)3 

1 

2.006 

3.872 

8.414 

20.217 

46.194 

99.781 


E[Bk(p)] 

1 

1 .440 
2.942 
7.399 
18.622 
43.041 
93.075 


Table  VI  'cont.) 


I  p 

k 

ECBfe(p)] 

E[B*(p)] 

1  7 

3 

206  .  76  2 

194.150 

;  .i 

9 

422.524 

397.301 

1  .i 

10 

855.048 

804. 60T 

.2 

1 

1 

1 

i  .2 

2 

2.336 

1.926 

.2 

3 

5.511 

5.027 

.2 

4 

13.927 

13.213 

.2 

5 

32.700 

31  .293 

i  .2 

6 

71.396 

68.581 

;  .2 

7 

149.791 

144.162 

-2 

8 

307.583 

296.324 

1  .2 

9 

624.166 

601 .648 

.2 

10 

1258.331 

1213.296 

1  .3 

1 

1 

1 

!  -3 

2 

2.652 

2.412 

i  -3 

3 

7.117 

6.834 

?  .  .3 

4 

17.948 

17.431 

.3 

5 

40.881 

39.849 

}  -3 

6 

87.762 

85.698 

.  .3 

7 

182.524 

178.396 

I  3 

8 

373 .048 

364  .  79  3 

i  .3 

9 

755.096 

738.585 

i  .3 

10 

1520.193 

1487.771 

’ 

.4 

1 

1 

1 

•  4 

2 

2.992 

2.862 

.4 

3 

8.422 

8.253 

i  -4 

4 

20.780 

20.447 

!  .4 

5 

46.559 

45.893 

.4 

6 

99.118 

97.787 

.4 

7 

205.237 

202.573 

.4 

8 

418.473 

413.146 

.4 

9 

845.947 

835.293 

.4 

10 

1701.894 

1680.586 

.5 

1 

1 

1 

.5 

2 

3.313 

3.250 

.5 

3 

9.398 

9.305 

.5 

4 

■’2.784 

22.597 

.5 

5 

:0.5o8 

50.194 

.5 

6 

107.136 

106.388 

.5 

7 

221  .272 

219.776 

.5 

8 

450.514 

447.552 

.5 

9 

910. 0R7 

904.104 

.5 

10 

1830.175 

1818.207 

hPPENru 

We  consider  the  unmodified  Hayes  Scheme  described  In  Section  2.  We  assune  we  have  a  tree  consisting  of  2* 
leaf  nodes,  the  upper  subtree  with  2**'  leaf  nodes  and  the  lower  subtree  with  2*"'  nodes.  Consider  the 
following  four  mutually  exclusive  and  exhaustive  events: 


k 

Ej :  No  positive  responses  from  2  terminals.  j 

L  t  L_1  f 

E,:  Ho  positive  responses  from  upper  2  terminals;  one  or  more  positive  ror, ponses  from  lower  2  | 

terminals.  j 

i 

E*:  No  positive  responses  from  lower  2*"*  terminals;  or*  or  more  positive  responses  from  upper  2*4'1  j 

terminals.  j 

E^:  One  or  more  positive  responses  from  both  lower  and  upper  set  of  Sr  terminals.  ; 


The  probability  of  these  four  events  are: 

P(E,)  *  (1-p)2  . 

1  -k-l  ~k-l 

P(E?)>P(E3) -(l-p)Z  (1  -  (l-p)Z  ), 

pk-1  a 

p(t4)  -  0  -  O-pr  r*. 


Let  E[Q.(p)]  be  the  average  nuafcer  of  queries  required  to  poll  the  2k  terminals.  Then  , 

K  •  i 


19-9 


But 


E[Qk(p)|E-,3  -  1. 

E[Qk(p)|E2]  •  €[Qk(p)|E3]  »  2  +  E[Qk_i(p)|Es]. 
EWk(p)|E4]  •  1  +  2ECQac_1(P>  IE53» 


,k-l 


where  Is  the  event  that  there  are  one  or  more  positive  responses  from  a  set  cf  2  terminals. 
But  It  Is  easy  to  verify  that 

,k-l 


E[Qk  ,<P)]-  (1-P)‘ 

ECQ^^PJIE^  -  — ^ - ^r~ 

1  -  (1-p)2 

Substituting  we  then  find  that 

E[Qk(p)3  -  1  ♦  aCQ^tp)]  -  20-P)2 


which  Is  the  desired  result. 

Similar  derivations  yield  the  other  recursive  formi/Uj  given  In  this  paper. 

ACKNOWLEDGEMENT 

This  work  was  supported  by  the  National  Science  Foundation  under  Grant  F3C-7S2114Q,  Much  of  this  work  was 
done  in  collaboration  with  Professor  Don  Towsley  and  the  author  gratefully  acknowledges  his  permission  to 
publish  this  work  here. 

REFERENCES 

Capetanakls,  J.  I.,  Sec' 'mber  1979,  "Tree  Algorithms  for  Packet  Droadcast  Channels,"  IEEE  Trans,  on  Infor¬ 
mation  Theory.  Vol  ,  IT-c.,  pp.  505-515. 

Gudjohnsen,  E.,  D.  Towsley  and  J.  K.  Wolf,  June  1900,  "On  Adaptive  Polling  Techniques  for  Computer  Cmmnml- 
catlon  Networks,"  ICC  Conference  Record,  Vol.  1,  pp.  13.3.1-13.3.5. 

Hayes,  J.  F.,  August  1978,  "An  Adaptive  Technique  for  load  Distribution,"  IEEE  Trans,  on  Coamunlcatlons, 
Vol.  COM-26,  pp.  1178-1186. 


DISCUSSIONS 
SESSION  IV 

REFERENCE  NO.  OF  RARER :  IV-15 

DISCUSSOR'S  NAME.  Harvey  Nelson,  Naval  Weapons  Center,  USA 
AUTHOR'S  NAME:  A.  0.  Hard 

C0N1ENT:  I'm  very  Interested  in  your  concept  of  an  engineer  work  station.  That  is  an  Integrated  net 
of  automated  tools  for  developing  requirements  and  proceeding  on  to  PSL/PSA  and  code.  When  do  you 
expect  to  have  the  work  station  operational?  What  will  be  the  steps  of  implementation?  What  marketing 
or  availability  plans  do  you  foresee? 

AUTHOR’S  REPLY:  We  hope  to  have  the  work  station  operational  in  the  first  half  of  1982  but  I  would 
prefer  not  to  detail  the  Implementation  steps  here.  The  tool  Is  being  developed  for  In-house  use  and 
we  do  not  have  any  short-term  plans  to  make  it  commercially  available. 


REFERENCE  NO.  OF  PAPER:  IV-15 

DISCUSSOR'S  NAME:  Or.  N.  J.  B.  Yeung,  Ultra  Electronic  Controls 
AUTHOR'S  NAME:  A.  0.  Ward 

COMMENT:  Have  you  considered  producing  code  automatically  in  a  second  language,  e.g.  PASCAL,  as  well 
rs  In  CORAL?  A  second  code  implementation  would  enable  you  to  perform  a  dual-ceding  type  tes:  of  the 
first  implementation  (but  would  not  help  in  checking  the  speclfi :ation,  of  course). 

AUTHOR'S  REPLY;  We  have  considered  languages  other  than  CORAL  but  not  for  the  reason  you  cite.  Taking 
a  long  view,  we  will  certainly  wish  to  have  a  similar  capability  for  use  with  the  ADA  language  and  a 
program  of  work  Is  being  considered  to  achieve  this. 


REFERENCE  HO.  OF  PAPER:  IV-15 
DISCUSSOR'S  NAME:  Or.  von  Issendorff 
AUTHOR'S  NAME:  A.  0.  Ward 

COMMENT:  1  am  quite  impressed  by  your  method  which  seems  to  be  very  usable.  But,  In  case  you  would 
like  to  select  your  method  or  another  one— and  there  are  many  mere— I  would  not  have  tic  means  to  do 
so.  So,  could  you  please  compare  your  method  to  others. 

AUTHOR'S  REPLY:  To  answer  this  question  properly  Is  clearly  outside  the  scope  of  this  meeting.  So,  1 
would  like  to  respond  In  two  ways.  First,  when  we  were  formulating  our  Ideas  on  requirements  analysis 
just  over  2  years  ago,  there  seemed  to  be  few  alternatives.  TRW's  RSL/REVS  system,  although  powerful, 
was  not  cownerclally  available  and  the  host  machine  and  language  were  not  compatable  with  our 
environment.  SADT  was  not  widely  accessible  In  the  United  Kingdom  and  we  understood  that  efforts  to 
model  SADT  descriptions  In  PSL  had  not  proved  successful  at  that  time. 

As  far  as  tools  were  concerned,  there  were  two  alternatives,  Michigan's  PSL/PSA  and  the  U.K. 
sy-i-em  SOS.  The  letter  required  significant  front-end  effort  to  be  made  practicable  and  again  was  only 
available  on  a  host-machine  to  which  we  did  not  have  access.  PSL/PSA,  rn  the  other  hand  had  a  rich 
language  and  was  supported  on  our  mainframes. 

The  second  point  1  would  make  Is  that  there  are  two  studies  which  Dr.  von  Issendorf  may  find 
useful.  The  first  was  sponsored  by  RSRE  and  Is  In  the  public  domain,  being  an  International  survey  of 
reoulrements  analysis  methods  and  tools.  The  second  Is  currently  being  undertaken  by  the  Department  of 
Industry  and  Is  entitled  "Do!  Ada  Methodology  Study."  The  latter  should  report  before  the  end  of  1981. 


REFERENCE  NO.  OF  PAPER;  IV-17 
DISCUSSOR'S  NAME:  K.  Brammer,  F.SG 
AUTHOR'S  NAME:  Enslow  (Llvesey,  presenter) 

COMMENT:  Would  you  explain  how  priority  Interrupts/requests  are  handled  by  the  fully  distributed 
processing  system  (where  the  partlclp*' ing  units  seem  to  have  equal  rights);  for  Instance,  If  a  five 
control  component  within  un  avionic  system  needs  instant  action.  Can  you  elaborate  on  the  notion  of 
the  "price,  a  user  of  the  FOPS  has  to  offer  while  bidding  for  being  served.  Is  it  meant  literally  (In 
dollars)  or  Is  *t  an  abstract  concept? 


S4-2 


AUTHOR'S  REPLY:  (1)  Components  can  have  equal  rights  In  the  sense  of  cooperative  autonomy,  but  still 
have  differing  priorities.  If  a  user  In  the  system  needs  (and  deserves)  Instant  service,  then  It  is 
effectively  bidding  a  very  high  "price"  and  should  win  most  contests  for  resources. 

An  extrenely  high  priority  user  might  even  have  dedicated  resources. 

(2)  The  price  in  bidding  is  whatever  Is  msanlngful  In  the  system:  dollars,  budget,  or 

priorities,  etc. 

(This  is  the  op:n1on  of  the  presenter,  not  necessarily  that  of  the  author.) 


REFERENCE  NO.  OF  PAPER:  IV-17 
OISCUSSOR'S  NAME:  T.  Smesttd,  NDRE,  Norway 
AUTHOR'S  NAME:  Enslow  (vivesey,  presenter) 

COMMENT:  When  trying  to  Increate  the  performance  of  decentralized  decision  making,  one  Is  often  faced 
with  the  second-guessing  phenomenon  (a  decision  maker  anticipates  the  actions  of  other  decision  makers 
to  make  his  own  actions  more  effective).  Is  this  phenomenon  present  In  your  problem  formulations  -  and 
can  It  be  used  to  Imp.rve  the  performar-e? 

AUTHOR'S  REPLY:  One  classic  example  of  this  Is  the  bidding  problem.  One  asks  for  a  resource.  One  Is 
told  It  is  available,  but  then  when  one  reserves  the  resource.  It  has  already  been  taken  by  someone 
else.  This  Is  due  to  time  delays  In  Inquiries  and  reservations.  At  Georgia  Tech,  we  are  actively 
investigating  this  problem. 

(This  Is  the  opinion  of  the  presenter,  not  necessarily  that  of  the  author.) 


REFERENCE  NO.  OF  PAPER:  IV-18 
OISCUSSOR'S  NAME:  Enslow  (Llvesey) 

AUTHOR'S  NAME:  L.  Svobodova.  INRIA 

COMMENT:  Is  it  not  true  that  In  this  system  message  passing  would  have  to  be  atomic  (If  a  crash 
occurred  after  the  textual  part  oV  e  message  arrived,  but  before  the  message  Identifier  did  (or 
vice-versa)  then  an  inconsistent  sxate  might  result)?  Do  you  know  any  system  In  which  this  Is  taken 
care  of? 

AUTHOR'S  REPLY:  (1)  It  Is  not  necessary  that  the  communication  subsystem  delivers  messages  atomically, 
However,  the  receiver  must  be  able  To  check  the  Integrity  of  a  request.  If  the  textual  part  of  a 
message  arrived  before  the  identifier  of  the  atomic  action  to  be  created  by  the  request,  the  request 
would  not  be  processed,  since  the  first  thing  that  must  be  done  Is  to  create  a  commit  record,  for  which 
It  Is  necessary  to  have  the  Identifier.  If  the  textual  part  got  lost,  the  atomic  action  opened  when 
the  Identifier  was  received  would  be  aborted,  since  a  timeout  Is  associated  with  each  commit  record. 

(2)  Coimminlcatlon  protocols  that  provide  virtual  connections  deliver  messages  atomically,  however.  It 
does  not  mean  yet  that  a  message  Is  delivered  atomically  to  the  destination  process.  I  do  not  know  any 
distributed  system  that  Implements  atomic  process  to  process  communication. 


REFERENCE  NO.  OF  PAPER:  IV-18 
OISCUSSOR'S  NAME:  Van  Keuk,  AVP  Member 
AUTHOR'S  NAME:  L.  Svobodova 

C0M1ENT:  Crashes  as  you  said  can  occur  as  a  consequence  of  Incorrect  stochastic  data  as  we  are  faced 
with  In  signal  processing  applications.  Do  you  see  a  conflict  of  your  technique  and  backtraclng 
facilities  for  test  and  debugging. 

AUTHOR'S  REPLY:  It  Is  true  that  automatic  rollback  to  an  earlier  state  conflicts  with  testing  and 
debcjo  :g,  where  it  Is  important  to  keep  a  trace  also  of  the  erroneous  states.  However,  tie  mechanisms 
that  I  described  are  Intended  tc  facilitate  orderly  recovery  rather  than  Impose  It  at  all  times.  That 
is.  It  would  be  possslble  to  inhibit  them  while  In  a  debugging  stage.  Also,  we  have  been  designing  a 
system  where  the  object  versions,  even  the  Invalidated  on1.,,  are  preserved  for  an  unlimited  period  of 
time.  The  invalidated  versions  are  not  accessible  to  ordinary  user  programs,  but  they  could  be  made 
available  to  a  debugger.  (See  reference  SVOB  80) 


S4-3 


REFERENCE  NO.  n F  PAPER:  IV- 18 

DISCUSSOR’S  NAME:  K.  Shin,  Rensselaer  Polytechnic  Inetltute,  USA 
AUTHOR'S  NAME:  L.  Svobodova 

COMMENT:  Did  you  carry  out  any  overhead  analysis?  Otherwise,  how  can  you  justify  your  proposed 
aethod? 

AUTHOR'S  REPLY:  No,  I  did  not  do  any  overhead  analysis  of  the  aethod  that  I  have  described  In  ay 
paper.  Clearly.  It  Is  important  to  find  out  if  this  aethod  la  practical,  however,  1  do  not  agree  that 
what  Is  needed  is  overhead  analysis.  Overhead  with  respect  to  what?  1  believe  that  the  uase  of 
developing  reliable  software  offered  by  the  aethod  (a  programmer  can  write  application  sc  t>*re  without 
any  concern  about  restoring  consistent  state  in  case  of  a  failure)  is  r  ore  important  than  r>ie 
additional  processor  time  and  memory  need  to  Implement  it.  And,  I  believe  that  only  experlaental  work 
can  demonstrate  if  the  proposed  method  Is  practical. 

REFERENCE  NO.  OF  PAPER:  IV-19 

DISCUSSOR’S  NAME:  Horst  Ulster,  Germany 

AUTHOR'S  NAME:  J.  K.  Wolf 

COMMENT:  Why  using  polling  method  at  all?  Why  not  Issuing  a  broadcast  and  let  then  all  terminals  with 
a  "yes"  respond  in  a  priority  order?  (Any  modern  system  should  be  able  to  do  more  than  polling.) 

AUTHOR'S  REPLY:  Since  any  terminal  Initially  only  knows  Its  own  state,  the  suggested  technique  of 
responding  in  priority  order  Is  equivalent  to  "roll-call"  or  "hub"  polling  which  the  paper  show  Is  not 
as  efficient  as  probing. 

Also,  I  believe  the  author  has  In  mind  a  system  where  the  polling  Is  used  as  a  method  of  terminals 
gaining  channel  access  to  transmit  information.  This  Is  only  one  of  many  uses  to  which  polling  can  be 
put.  Status  collection  Is  a  different  use. 


REFERENCE  NO.  OF  PAPER:  IV-19 

DISCUSSOR'S  NAME:  K.  G.  Shin,  Rensselaer  Polytechnic  Institute,  USA 
AUTHOR'S  NAME:  Prof.  Wolf 

COMMENT:  How  would  you  handle  an  error  In  answering  the  query? 

AUTHOR'S  REPLY:  Some  of  the  algorithms  are  more  sensitive  to  errors  than  others.  As  a  rule  of  thumb, 
the  more  efficient  the  algorithms,  the  less  redundancy  exists  in  the  algorithm  and  thus  the  more 
sensitive  the  algorithm  Is  to  errors. 


REFERENCE  NO.  OF  PAPER:  IV-19 
DISCUSSOR'S  NAME:  J.  H.  Saltier,  MIT,  USo 
AUTHOR'S  NAME:  Prof.  Wolf 

COMMENT:  How  about  applying  this  polling  technique  to  a  speed-limit  circuit?  In  that  case,  there  may 
be  a  longer  time  required  to  poll  a  larger  number  of  points  because  of  fan-out.  It  would  seem  that 
this  effect  would  lead  to  a  different  optimum  polling  pattern. 

AUTHOR'S  REPLY:  This  Is  an  excellent  suggestion.  We  have  considered  a  problem  which  la  in  some  aerte 
the  dual  of  the  problem  y.-:  suggest  whereby  cnere  is  an  uppzr  limit  to  the  number  of  times  a  station 
can  be  "probed"  In  any  given  polling  cycle.  It  certainly  makes  a  great  deal  of  aense  to  consider  the 
problem  you  suggest. 


REFERENCE  NO.  OF  PAPER:  IV-19 
DISCUSSOR'S  NAME:  Dr.  Van  Keuk,  AVP  Member 
AUTHOR'S  NAME:  Prof.  Wolf 

COMMENT;  For  your  analysis  you  need,  as  you  said,  assumptions  on  the  statistical  Independence  of  the 
events.  1  feel  In  addition  you  need  the  assumption  of  constant  probability.  If  this  is  not  given,  how 
do  you  modify  your  algorithm. 

AUTHOR'S  REPLY:  We  are  presently  working  on  just  this  problem.  We  are  c-v' sidering  the  simplest  case 
where  we  liave  two  classes  of  stations,  one  class  having  probability  P)  of  being  active  and  the  other 
class  having  probability  P?  of  being  active.  At  this  time,  I  cannot  give  you  any  concrete  results 
except  tc  say  that  one  must  carefully  match  the  algorithm  to  the  assumptions  on  the  statlstice  of  the 
stations  In  order  to  achieve  an  efficient  scheme. 


20-1 


STAGE-STATE  RELIABILITY  ANALYSIS  TECHNIQUE 
Alan  D.  Stern 

Boeing  Military  Airplane  Company 
Digital  Flight  Controls  Research 
Seattle,  Washington 


SUMMARY 


Conventional  reliability  analysis  techniques  such  as  fault-tree  and  Boolean  algebra  methods  are 
difficult  to  apply  to  redundant  systems  with  complex  Interactions  and  redundancy  management 
philosophies.  Some  advanced  flight  control  systems,  for  example,  employ  multiple  redundant  channels 
which,  with  proper  redundancy  management  and  failure  detection,  can  degrade  to  simplex  operation.  The 
reliability  analysis  must  properly  account  for  the  defined  success  criteria,  redundancy  level, 
redundancy  management  technique,  system  dependencies, and  failure  detection  coverage.  The  Stage-State 
reliability  analysis  technique  properly  accounts  for  these  factors.  It  Is  also  computationally  simple 
such  that  triplex  redundant  systems  have  been  analyzed  using  an  early  1970's  desktop  computer. 

This  method  Is  well  suited  for  analysis  by  the  system  architect.  The  process  begins  with  a  system  block 
diagram  showing  all  element  connections.  A  success  logic  diagram  Is  then  written  reflecting  all 
possible  success  states.  The  probability  of  success  equation  Is  written  directly  from  the  logic  diagram 
and  evaluated  by  substituting  the  probability  expression  for  each  system  element.  Multiple  success 
criteria  can  be  applied  to  one  problem  formulation  simply  by  deleting  those  states  which  do  not  satisfy 
the  success  criterion. 

1.0  INTRODUCTION 

Advanced  digital  flight  control  systems  (DECS)  for  new  aircraft  are  assuming  additional  -oles  relative  to 
today's  operational  vehicles.  Such  roles  Include  stabllty  augmentation  systems  (SAS)  and  manuever  load 
alleviation  (MLA)  systems  plus  other  requirements  which  may  require  the  DFCS  to  have  flight  safety 
reliability  over  significant  portions  of  the  flight  envelope.  Provision  of  such  reliability  while 
simultaneously  striving  to  minimize  hardware  redundancy  levels,  have  led  to  the  development  of 
sophisticated  DFCS  architectures.  Some  premising  system  architectures  have  Included  In  their  redundancy 
management  philosophies,  the  ability  to  Isolate  failures  to  a  particular  line  replaceable  unit  (LRU)  and 
to  select  that  LRU  successfully  to  the  simplex  level.  The  ability  to  redundancy  manage  LRU's  In  this 
fashion  reqlres  that  the  architecture  provide  the  transfer  of  data  (from  redundant  LRU's)  between 
channels  (see  Figure  1),  and  that  the  selection  of  one  healthly  LRU  from  two  choices  be  achievable.  The 
probability  of  selecting  one  healthy  LRU  when  one  of  two  redundant  LRU's  has  failed  Is  called  "failure 
coverage*  or  Just  “coverage". 


INTERCHANNEL 

COMMUNICATION 


Figure  1.  Duplex  DFCS  With  Inttrchannei  Communication 


The  concept  of  various  degrees  of  “dependency"  also  arises  with  such  architectures.  An  LRU  has  a 
dependency  when  it  must  rely  upon  one  or  more  other  LRU's  to  operate  successfully  before  that  LRU  can 
accomplish  Its  function  In  the  system.  For  example.  Figure  1  shows  that  the  transfer  of  sensor  data 
between  channels  depends  upon  successful  operation  of  It's  Input  and  DCU. 

Conventional  reliability  analysis  techniques  such  as  fault-tree  and  Boolean  algebra  methods,  become 
extremely  difficult  to  use  for  complex  architectures  possessing  redundancy  with  dependencies  and 
coverage.  The  mathematics  becomes  massive  with  high  probability  for  error.  The  Stage-State  reliability 
analysis  technique,  on  the  other  hand.  Is  quite  simple  while  possessing  the  following  features: 

a)  accounts  for  redundancy  level  for  each  specific  LRU, 

b)  defines  the  collection  of  probability  states  which  represent  a  desired  success  or  failure 
criteria, 

c)  a  probability  of  success  P(S)  equation  can  be  written  directly  from  a  success  logic  diagram 
which  Includes  the  effects  of  dependencies, 

d)  the  P( S)  equation  Is  more  compact  and  requires  minimal  memory  for  a  digital  evaluation  relative 
to  competing  methods, 

e)  multiple  success  criteria  are  easily  evaluated  by  simply  deleting  those  states  (terms)  which  do 
not  satisfy  the  new  success  criteria,  and 

f)  the  effect  of  failure  coverage  Is  easily  Incorporated  In  the  P(S)  equation. 


7.0-2 


2.0  THE  METHOD 

2.1  Concept  Definition 

The  Stage-State  method,  described  below,  Mas  developed  by  Mr.  dlmy  Rice  of  The  Boeing  Military  Airplane 
Company  In  1978  to  support  the  systematic  analysis  of  a  variety  of  DFCS  architectures  with  a  large 
number  of  system  elements  (Reference  1).  The  need  he  fulfilled  Mas  to  nrovlde  a  reliability  analysis 
tool  that  the  system  designer  could  easily  use  to  conduct  system  architecture  trade  studies. 

The  method  Is  based  upon  stralght-forMard  use  of  set  *n«u>ry  und  axioms  of  probability.  It  considers  a 
space  S  which  contains  all  possible  outcomes  of  the  system  and  breaks  them  up  Into  mutually  exclusive 
events,  or  states.  The  sum  of  the  probabilities  of  all  such  states  must  therefore  &*  unity. 

Consider  the  following  example  of  a  system  consisting  of  a  duplex  stage. 

A  "stage"  Is  defined  as  a  set  of  like  redundant  elements  (LRU's)  as 


gH 


HD 


flpcrt  7.  Dupbx  Step* 


shown  In  Figure  2.  This  figure  Illustrates  a  success  logic  diagram;  l.e.,  success  consists  of  the  chain 
between  points  1  and  2  not  being  totally  broken  due  to  failures.  This  stage  consists  of  four 
Independent  states.  A  "state"  defines  a  particular  combination  of  failed  and/or  healthy  LRU's  of  a 
given  stage.  The  possible  duplex  states  are 


AB 

Both  Healthy  »  RaRb  *  STl  •  State  1  »  P(  1  ) 

Ol'P 

AB 

"A"  Healthy,  "B*  Failed  -  RAQB  *  ST2  -  State  2  «  P(  2  ) 

OUR 

AB 

"A"  Felled,  "B"  Healthy  *  QaRb  *  ST3  *  State  3  *  P!  3  ) 

DUP 


Doth  Failed 


AB 

-  QaQb  "  STA  •  State  A  -  PI  A  ) 

DUP 


(1) 


where  Rj  ,nd  Qj  are  the  probabilities  that  the  1th  LRU  Is  good  or  bad,  respectively.  The 
probability  that  the  stage  Is  good  or  bad  Is 

P(S)  *  1.0  •  RaRb  +  RaQb  +Qa«b  +  QaQb  (2) 

A  success  criterion  can  be  applied  to  these  states.  If  that  criterion  Is  that  either  A  or  B  good 
represents  success,  then 

P( success)  *  RaRb  +  RaQb  +  Qa*B  (3) 

If  success  says  that  both  must  be  good,  then  only  state  1  applies;  l.e.,  P(success)  »  RaRb. 

ine  Stage-State  technique  employs  conditional  probability  to  adjust  success  criteria.  Let  %  be  defined 
as  the  success  function.  The  probeblllty  of  S  occurring  for  a  duplex  stage  Is 

A 

P(S)  »  £  P ( S/ST1 )  P(ST1 )  (A) 

1»1 

where  P(S/ST1)  Is  the  conditional  probability  of  success  given  the  stage  Is  In  state  t  (STl).  For  the 
duplex  stage,  where  either  element  healthy  constitutes  success,  equations  (1)  Indicate 


P { S/ST1 )  •  P(S/ST1)  -  P(S/$T2)  ■  P(S/$T3)  -  1 
P(S/STA)  -  0 


(5) 


21 


Therefore,  P(s)  for  duplex  stage  AB  Is  the  probability  that  that  stage  Is  In  states  1.  2,  or  3  duplex. 
STG  AB 

P(S)  •  P( 1 ,2 ,3  )  -  1  F(ST1)  +  1  PIST2)  +  i  P(S13>  ( 

raQb  9arB 

If  elements  A  and  B  are  .entical,  then  defining  the  reliability  of  LRU  A  as  RA  and  the  probability 
of  failure  of  A  as  QA  Me  get 

P(S>  -  Ra2  ♦  2  RAQA  ( 

or  «  2Ra  -  f^2  ( 

where  QA  *  1  -  RA  < 

2.2  Ett'ec-  of  Dependencies 

Consider  the  duplex  system  shown  In  Figure  3  which  has  three  duplex  stages  (A,  B,  and  C)  with  C  being  a 
dependency  for  stages  A  and  B.  Success  Is  defined  a-,  getting  Information  to  the  success  node,  S. 


;  rEM  DIAGRAM  I  SUCCESS  LOGIC  DIAGRAM 

Figure  3.  Duplex  System  With  Dependency  ICtege  C) 

Two  success  states  will  be  defined  for  each  stage  -  both  LRU's  good  (ST1);  and  either  LRU  good  (SI 2). 

The  probability  of  success  can  be  written  as 

STGC  STGC  (10 

P(S)  *  P(S/C  )P(  1  )  ♦  P(s£lP(  2  ) 

ST  l  pup  ST2  p'|h 

Thi".  equation  examines  success  jased  upon  the  most  dependent  stage  first.  It  reads:  P( S)  cguals  the 
probability  success  given  stage  C  Is  In  state  1  (duplex)  times  the  iiobablllty  that  stage  C  is  in  state 
1,  plus  the  probability  of  success  given  stage  C  is  in  state  2  (duplex)  times  the  probability  that  stage 
C  is  in  state  2. 

To  examli »  the  first  term  we  can  redraw  Figure  3  assuming  stage  c  is  in  state  \i  i-e.,  both  good. 


SYSTEM  DIAGRAM 


SUCCESS  LOGIC  DIAGRAM 


Figure  4.  Duplex  Syetem  With  Stege  C  in  Stett  1 


20-4 


From  Figure  4  It  1$  readily  observed  that  the  conditional  success  can  now  be  defined  as  follows 

PtS/STl)  «  Probability  of  Aj  or  A2;  *nd  ®1  or  B2  good 
STG  A  STG  B 

P(S/$  )  -  P(  1,2  )  P  (  1,2  )  (11) 

ST1  DUP  OUP 

From  equation  (8)  the  following  Is  written 

-  (2R*  -  R^lURe  -  Re2)  (12) 


Now  let's  evaluate  the  conditional  probability  where  stage  C  Is  In  state  2;  l.e.,  only  Ci  or  C2 
good.  Figure  5  Illustrates  this  state. 


SYSTEM  DIAGRAM  SUCCESS  LOGIC  DIAGRAM 


Figurg  5.  DupUx  Syttwn  With  Sttgt  C  In  State  2 
From  Figure  5  the  probability  lo\  success  given  stage  C  Is  In  state  2  Is 


P1S/ST2) 


(STG  A) 
SIMP 


P 


(STG  B) 
SIMP 


■  Ra  *  Ra  03) 

That  Is,  If  Ci  Is  healthy,  success  depends  upon  the  probability  that  the  elements  A  and  B  are  healthy 
In  their  simple*  state.  Substituting  (12)  ,<nd  (13)  Irtc  (10)  gives  the  final  result. 


P($)  •  (2RA-RA2)(2RB-RB2)P(STf  C  )  ♦  RARbP(ST£  S 


Rc2  2"cQc*  ZRC'2Rc2 


(U) 


P(S)  *  {2Ra-Ra2)(2Rb_rb2)  ^2  ♦  raRb  (2Rc  .  2Rc2) 


(15) 


Equation  (15)  was  derived  with  relative  ease.  A  comparison  with  a  Boolean  algebra  solution  to  the  same 
problem  Is  Illustrated  In  Reference  1  which  shows  three  pages  of  detailed  algebra  were  required  to 
obtain  a  result  Identical  to  the  Stage-State  method. 


2.3  A  More  Complex  Exanple 


A  duplex  OFCS  will  now  be  evaulated.  This  system,  shown  In  Figure  6,  has  duplex  stages  for  all  LRU’s; 
l.e.,  sensors  (A  and  B),  Input  sections  (IN),  digital  control  units  (3CU),  output  sections  (OUT),  and 
control  surface  servos  (AIL,  RUQ,  and  ELE).  The  DCU's  have  an  Interchannel  comminl cation  capability. 
This  OFCS  has  three  dependencies.  The  most  dependent  element  Is 


ELEVATOR 

SURFACE 


RUDDER 

SURFACE 


AILERON 

SURFACE 


Figurt  6.  Dual  OFCS 


SERVOS 


20-5 


the  DCU  since  Its  loss  constitutes  loss  of  i  full  channel.  The  Input  and  output  dependency  Inpacts  the 
use  of  one  full  sensor  or  servo  set.  This  Is  Illustrated  by  the  associated  success  logic  diagram  shown 
In  Figure  7.  This 


Figc.  t  7  Succnt  topic  Dtogr&m  for  tft*  Out I DFCS 


logic  diagram  can  be  used  to  define  success  states  subject  to  various  levels  of  conditional 
probabilities  as  defined  below. 

The  probability  of  system  success  P(S)  can  be  written  using  repetitive  application  of  the  definition  of 
conditional  probability  anu  the  sub-division  of  the  sample  space  Into  disjoint  states.  For  the  DCU 
stage,  P(S)  Is 


DCU  DCU  DCU  DCU 

PIS)  ■  P{S/  1  iP(  1  )  +  P  (S/  2  )P(  2  )  (16) 

DUP  CUP  DUP  DUP 


DCU 

For  the  DCU  In  state  1  duplex,  the  system  reduces  to  that  shown  In  Figure  8,  and  P(S/  1  )  Is  defined 
from  this  diagram.  DUP 


output  function 


Figun  8.  DFCS  With  DCU  in  Stow  1  (Both  Good i 


Observe  that  the  first  subdivision  of  the  sample  space  was  to  divide  It  Into  all  of  the  possible  states 
of  the  most  dependent  stage.  Given  that  the  DCU  stage  Is  in  state  1  duplex,  the  system  Is  re.luced  to 
one  composed  If  Independent  Input  ana  output  functions  but  which  possess  Internal  dependencies  The 
Input  function  has  the  Input  stage  as  a  dependency  and  the  output  function  has  the  output  stage  as  a 
dependency.  Then 


DCU 

P(S/  1  )  *  P 
DUP 

•  P{ INPUT  FUNCTION  GOOD)-  P(0UTPUT  FUNCTION  GOOD)  (17) 

*  P(INF)P(OUTF) 


[(INPUT  FUNCTION  GOOD)’ (OUTPUT  FUNCTION  GOOD]) 


The  Input,  function  Is  now  subdivided  Into  a  reduced  set  of  disjoint  sample  spaces. 


IN  IN  IN  IN 

P(1NF)  »  P(  IMF/  1  )P(  1  )  +  P(  INF/  2  )P(  2  )  (18) 

DUP  DUP  DUP  DUP 


A  B 
P(1,2)P(1,2) 
DUP  DUP 


(19) 


20-6 


IN 

P! INF/  2  )■ 
DUP 


-0-0- 


A  B 

P(  1  )P(  1  ) 
SIMP  SIMP 


therefore , 


A  B  IN  A  B  IN 

P(INF)  *  P(1,2)P(1,2)?(  1  )  +  P(  1  )P(  1  )P(  2  ) 

DUP  DUP  DUP  SIMP  SIMP  DUP 

In  «  similar  fashion  the  output  function  Is  subO’vIded  Into  Its  reduced  sample  space. 
P(Output  Function  Is  Good)  *  P(OUTF) 

OUT  OUT  OUT  OUT 

P(OUTF)  -  P(01ITF/  1  )P(  1  )  +  P( OUTF /  2  )P(  2  ) 

DUP  DUP  DUP  DUP 


OUT  | 

AIL1 

k  r  hUDI - 

PIOUTF/  1  )  *  - 

DUP  LI 

AIL2I 

j-l  1-  mjoj  ■ 

AIL  RUD  ELE 

P(1.2)P(1,2)P(1,2) 
DUP  DUP  DUP 


OUT 

PIOUTF/  2  ) 
UUP 


AIL1 

1 - L 

-RUD1 

! - L 

-  ELEV1 

3 

—louTal-/ 

I 

L-RUOJ 

-  ELEV2 

( - 1  I - 1  | - 1  AIL  RUD  ELE 

r  -UlLI  - BUD1 - ELEVV—  -  P(  1  )P'  1  >P(  1  ) 

I - 1  I - 1  I - 1  SIMP  SIMP  SIMP 

substituting  (23)  and  (24)  Into  (22)  and  then  (22)  and  (21)  Into  (17)  gives 


OCU 

r  A 

B 

IN 

A 

B 

IN 

'  1  )  * 

P(l,2)P(12)P( 

1  ) 

♦  P<  1  )P( 

1 

)P(  2  ) 

X 

DUP 

!_  DUP 

DUP 

DUP 

SIMP 

SIMP 

DUP 

AIL 

RUD 

ELEV 

OUT 

AIL 

RUD 

ELE 

OUT 

p(i,2; 

>”{i,2M 

*.2 

)P(  1  )  ♦  P( 

1 

)p(  1  : 

!P(  1 

)P(  2 

DUP 

DU- 

DUP 

DUP 

SIMP 

SIMP 

SIMP 

DUP 

ucu 

Looking  at  the  conditional  probability  P( S/  2  )  from  equation  (16),  It  can  be 

DUP 

seen  that  the  success  path  Is  now  one  simplex  channel  so  that 

OCU  A  B  IN  OUT  AIL  RUD  ELE 

P(S/  2  )  -  P(  1  )  P{  1  )  P(  1  )  P(  1  )  P(  1  )  P(  1  )  P(  1  ) 

DUP  SIMP  SIMP  SIMP  SIMP  SIMP  SIMP  SIMP 


Substltltlon  of  equations  (25)  and  (26)  Into  (16)  provides  the  final  result. 

Cr  A  B  IN  A  B  IN 

PIS)  -S|P(1,2)P(1.2)P(  1  )  +  P(  1  )P(  1  )P(  2  )  x 
DUP  DUP  SIMP  SIMP  DUP 


(T  A  B 

P(1,2)P(1. 
II  DUP  DU 

f  AIL  RU 

|P(1,2)P(1, 
L  DUP  DU 

P(  1  )P( 

L  SIMP  S 


AIL  RUO  ELEV  OUT  AIL  RUD  ELEV  OUT 
1,2)P(1,2)P(1,2  )P(  1  )  +  P(  1  )P(  1  )P(  1  )P(  2  ) 
DUP  DUP  DUP  DUP  SIMP  SIMP  SIMP  DUP 


A  B  IN  AIL  RUO  ELEV  OUT 
P(  1  )P(  1  )P(  1  )M  1  )P(  1  )P(  1  )P(  1  ) 
SIMP  SIMP  SIMP  SIMP  SIMP  SIMP  SIMP 


+ 


(27) 


The  various  reliability  expressions  with  appropriate  failure  rates  and  oxposure  times  may  be  substituted 
Into  each  state's  probability  factor  In  equation  (27)  to  obtain  a  numerical  '»*ult. 

2.4  Effect  of  Failure  Coverage 

Failure  coverage  Is  defined  as  the  probability  of  sucressfully  detecting  a  failure  within  a  redundant 
stage,  Isolating  that  failure  to  the  specific  LRU.  and  reconfiguring  the  stage  to  place  the  failed  LRU 
off-line.  It  Is  genr-ally  accepted  that  coverage  values  of  unity  ..re  possible  with  3  or  more  healthy 
redundant  LRU's.  The  oroo>m  ar.ses  when  a  failure  occurs  when  just  previous  tnere  were  only  2  healthy 
LRU's-whlch  has  failed?  Therefore,  the  coverage  factor  (c)  Is  used  \.o  modify  the  probability  of 
achieving  the  simplex  st,«te. 

The  Stage-State  method  Include1  cov  .'*#ge  by  spl  itln,  tne  simplex  state  Tor  a  stage  Into  two  parts-that 
which  successfully  degrades  to  simp, ex,  and  that  wii,.r,  does  not.  This  Is  now  Illustrated  for  a  d,  ilex 
stage.  Equation  (2)  Is  rewritten  below  for  a  duplr '  stage  with  Identical  LRU's. 


P ( good  or  bad)  *  1.0  •  R2  +  2RQ  +  Q2  <28) 

wo  wo  w> 

ST1  ST2  ST 3 

If  success  Is  defined  as  having  at  least  1  of  the  i  LRU's  healthy,  then  the  success  states  Include 
states  1  and  2  only.  State  2  represents  the  2  ways  In  which  simplex  operation  can  be  achieved.  When 
coverage  Is  not  unity,  the  probability  of  achieving  this  state  Is  2RQc.  In  this  process  a  new 
unsuccessful  state  has  evolved,  namely  2RQ(l-c).  Now 


previously  1  state 
/-*"-•  — «*»  , 

P(good  or  bad)  ■  1.0  *  R2  +  2RQc  +  2RQ(l-c)  +  02 


Success  States  Failure  States 


A  triplex  stage  can  be  evaluated  In  a  similar  fashion.  With  unity  coverage  the  triplex  states  are  shown 
by  equation  (30). 

PI  good  or  bad)  ■  1.0  »  R3  +  3R2Q  +  3RQ2  +  Q3 

WJ  w->  < — *-*•'  (30) 

ST1  ST2  ST3  ST4 


If  success  Is  again  defined  as  successfully  achieving  at  least  simplex  operation,  then  the  first  3 

states  represent  success.  State  3,  however, 

must  be  modified  by  c  If  the  coverage  Is  not  unity  so  that 


Plgood  or  bad) 


previously  1  state 

1.0  -  R3  +  3R2Q  +  3RQ2c  +3RQ2(1-c)  +  Q3 


Success  States 


Failure  States 


(31) 


3.0  Conclusion 

Traditional  reliability  analysis  methods  are  error-prone  and  difficult  to  usefor  complex  flight  control 
systems  possessing  a  large  number  of  LRU  types  and  the  redundancy  management  of  Individual  LRU's.  This 
Is  primarily  due  to  the  large  number  combinations  of  possible  success  states  and  dependencies.  The 
Stage-State  reliability  analysis  method  Is  a  much  simpler  approach  which  is  well-suited  to  use  by  the 
system  architect.  The  method  makes  It  readily  apparent  where  the  sources  of  the  system  unreliabilities 
are  locates.  Also,  because  of  Its  simplicity,  fewer  errors  arise  and  the  use  of  small  portable 
computers  Is  possible. 

4.0  References 

1)  Rice,  J.W.,  20  Uec  1979,  Olgltal  Flight  Control  Reliability  -  Effects  of  Redundancy  Level, 
Architecture,  and  Redundancy  Management  Technique,  Seatt’e,  Washington,  Boeing  Document 
D180-25578-1. 


Also  published  as  technical  paper  without  the  Stage-State  Reliability  Analysis  appendix  as: 

Rice,  J.W.  and  McCorkle,  R.D.  August  1979.  Digital  Flight  Control  Reliability  -  Effects  of 
Redundancy  level,  Architecture,  and  Redundancy  Management  Technique.  Boulder,  Colorado. 
AIAA  Guidam.-  and  Control  Conference.  AIAA  paper  #79-1893. 


21-1 


METHODOLOGY  FOR  MEASUREMENT  OF  FAULT  LATENCY  IN  A  DIGITAL 

AVIONIC  MINIPROCESSOR* 

John  G.  McGough 
Fred  Svern 

Flight  Systems  Division 
Bendlx  Corporation 
Teterboro,  New  Jersey  07608 

and 

Salvatore  J.  Bavuso 
NASA  Langley  Research  Center 
Hampton,  Virginia  23665 


ABSTRACT 


Using  a  gate-level  emulation  of  a  typical  avionics  ralnlprocessor ,  fault  Injection  experiments  were  per¬ 
formed  to  (1)  determine  the  tlme-to-detect  a  fault  by  comparison-monitoring,  (2)  forecast  a  program's 
ability  to  detect  faults  and  (3)  validate  the  fault  detection  coverage  of  a  typical  self-test  program. 

To  estimate  tlme-to-detect,  six  programs  ranging  In  complexity  from  6  to  147  instructions,  were  emulated. 
Each  program  was  executed  repetitively  In  the  presence  of  a  single  stuck-at  fault  at  a  gate  node  or  device 
pin.  Detection  was  assumed  to  occur  whenever  the  computed  outputs  differed  from  the  corresponding  outputs 
of  the  same  program  executed  In  a  non-faulted  processor.  Histograms  of  faults  detected  versus  number  of 
repetitions  to  detection  were  tabulated. 

Using  a  simple  model  of  fault  detection,  which  was  based  on  an  analogy  with  the  selection  f  balls  In  an 

urn,  distributions  of  tlme-to-detect  were  computed  and  compared  with  those  obtained  empirically . 

A  self-test  program  of  2,000  executable  Instructions  was  designed  expressly  for  the  study.  The  only  re¬ 
quirement  Imposed  on  the  design  was  that  It  should  achieve  95%  coverage.  The  program  was  executed  in  the 

presence  of  a  single  stuck-at  fault  at  a  gate  node  on  device  pin.  The  proportion  of  detected  faults  was 

tabulated. 

In  all  experiments  faults  were  selected  at  random  over  gate  nodes  or  device  pins. 

1 .  INTRODUCTION 

1.1  Background 

NASA's  Langley  Research  Cent.f  -  has  been  actively  pursuing  the  synthesis  of  a  reliability  assessment  capa¬ 
bility  for  fault-tolerant  computer-based  systems  for  several  years.  This  work  has  culminated  In  the  de¬ 
velopment  of  CARE  III  (Computer-Aided  Reliability  Estimation)  which  Is  a  general  purpt-.U’  reliability 
assessment  tool  for  highly  reliable  faul t.-tol erant  systems  tailored  toward  flight  crucial  avionic  systems 
employing  multiple  dlgltai  computers.  A  major  Innovation  of  CARE  III  Is  its  treatment  of  coverage  which 
is  a  vital  factor  In  the  reliability  modeling  of  digital  fault-tolerant  computer  systems.  Coverage,  a 
generic  term,  captures  the  notion  of  a  system's  ability  to  handle  hardware  faults  and  Involves  system 
fault  detection.  Isolation  of  the  fault  to  a  reconfigurable  (redundant)  hardware  module,  and  fault  recon¬ 
figuration  and  recovery.  The  first  two  components  have  been  modeled  extensively  a.s-d  have  been  shown  to 
be  critical  for  achieving  high  system  reliabilities. 

What  is  also  evident  in  the  literature  Is  a  lack  of  empirical  coverage  data  although  several  very  powerful 
reliability  evaluators  require  this  data.  As  a  result,  a  pilot  study  was  conducted  In  1978  to  test  the 
feasibility  of  measuring  detection  coverage  and  Investigating  the  dynamics  of  fault  propagation  In  a  digi¬ 
tal  computer.  The  specific  objectives  were  to  study  how  typical  software  causes  stuck-at  faults  to  prop¬ 
agate  and  hence  become  detectable,  to  account  for  as  many  software  code  characteristics  (e.g..  Instruction 
.-.ubset,  branching)  as  possible  affecting  detection  (with  an  eye  toward  optimizing  fault  detection  by  code 
synthesis),  aid  to  determine  a  method  of  forecasting  a  given  software  program’s  detecting  ability  prior  to 
computation.  A  series  of  fault  Injection  experiments  were  conducted  using  a  gate-level  simulation  of  a 
small  idealized  processor  with  a  limited  Instruction  set.  The  results  of  the  study  were  surprising  since 
they  contradicted  the  prevailing  belief  that  most  hardware  faults  cause  catastrophic  and  hence  detectable 
computational  errori.  In  fact,  a  significant  proportion  of  faults  remained  latent  after  many  repetitions 
of  a  program.  The  ramifications  of  these  observations  can  have  a  significant  impact  on  the  design  of 
faul t-tcl erant  digital  computers  which  employ  companion-monitoring  or  majority-voting  for  fault  detection 
and  Isolation.  The  risk  Is  associated  with  the  accumulation  of  latent  and  therefore  undetected  faults 
which  may  defeat  the  comparison-monitoring  or  majority-voting  detection  schemes.  Needless  to  say,  these 
considerations  are  of  paramount  importance  to  reliability  assessment;  as  a  result,  NASA  funded  another 
study  to  Investigate  the  findings  of  the  pilot  study  as  It  was  not  clear  from  the  pilot  study  that  similar 
results  could  be  obtained  for  a  real  processor-the  follow-on  work  was  based  on  a  real  avionic  processor. 
This  work  was  also  extended  to  evaluate  an  airborne  self-test  program,  to  account  for  undetected  faults, 
and  to  assess  the  significance  of  injecting  faults  at  the  gate-level  and  at  the  functional  pin-level. 


*  The  contents  are  based  on  the  study; 

"Methodology  for  Measurement  of  Fault  Latency  in  a  Digital  Avionic  Ml nl processor" ,  NAS  1-15946,  Flight 
Systems  Division,  Reh-ilx  Corporation,  sponsored  by  Langley  Research  Center,  Hampton,  Va. 


k 


21-2 


1.2  Objectives  of  the  Study 

A  primary  objective  of  the  present  study  Is  to  ascertain  whether  the  results  of  the  previous  study  apply 
to  a  real  avionics  processor.  Specifically, 

e  Given  a  set  of  software  programs  ranging  from  a  simple  "fetch  and  store"  to  a  complicated, 
multi-instruction  algorithm  Inject  a  single  fault,  selected  at  random,  and  observe  the  time  to 
detection.  Detection  Is  assumed  to  occur  whenever  there  Is  a  difference  between  the  computed 
outputs  of  the  faulted  and  non-faulted  processors  executing  the  same  program.  Determine  dif¬ 
ferences  In  detection  time  when  faults  are  Injected  at  the  gate-level  and  component-level. 

a  Based  upon  empirical  distributions,  develop  and  validate  a  model  of  fault  latency  that  will 
forecast  a  program's  fault  detecting  ability. 

The  following  additional  objective  was  added, 

a  Given  a  typical  avionics  self-test  program  Inject  faults  at  both  the  gate-level  and  component- 
level  and  determine  the  proportion  of  faults  detected. 

2.  EMULATION  DESCRIPTION 

2.1  BOX- 930  Architecture 

The  Bendlx  BOX-930  Digital  Processor  Is  a  microprogrammed,  ptpellned  machine  designed  around  the  AMD2901A 
four  bit  microprocessor  slice.  The  machine  contains  sixteen  general  purpose  registers  of  which  four 
registers  may  be  loaded  directly  from  memory  and  two  registers  may  be  used  as  base  registers.  One 
register  Is  used  as  a  stack  pointer. 

The  program  counter  and  memory  address  register  are  contained  In  the  9407,  a  chip  designed  to  perform 
memory  address  arithmetic.  Along  wish  a  temporary  register  contained  on  the  same  chip,  the  BDX-930  Is 
able  to  perform  four  basic  addressing  modes  Involving  three  registers  and  various  Instruction  fields. 

The  machine  contains  three  memory  Interface  data  registers  which  are  used  to  Input  and  output  memory 
data.  There  are  also  a  number  of  one  bit  status  flag  registers  that  can  be  manipulated  under  program 
control.  This  Includes  the  El  and  F2  registers,  which  are  hardware  flags,  and  the  Interrupt  enable, 
overflow  status  registers.  There  also  exists  the  Indirect  and  link  registers  used  by  the  microcode  for 
branching. 

The  microcode  Is  contained  In  seven  proms  and  a  pipeline  register  Is  Included  for  simultaneous 
microcode  fetch  and  decoding.  Various  Internal  and  external  conditions  can  affect  microcode  branching  as 
selected  by  the  microcode  Itself  and  a  microcode  control  prom.  In  addition  to  a  rich  Instruction  set 
which  Includes  16  and  3?  bit  fixed  point  operatl  ns.  there  Is  a  test  s»v  Interface  :n  the  microcode.  A 
selectable  saturate  mode  Is  available  which  limits  the  results  of  arithmetic  operations  when  overflow 
or  underflow  occur. 

For  simulation  purposes,  the  computer  has  been  divided  Into  six  partitions: 

1 .  Address  Processor 

2.  Data  and  Status  Registers 

3.  Microcontroller 

.  Pipeline  Register 

4.  ALU  (2901  A) 

5.  Microcode 

6.  Control  Proms 

The  partitioning  Is  roughly  equivalent  to  the  stages  of  the  pipe:  -  adress,  fetch,  decode,  and  execute. 
These  stages  of  the  pipe  are  joined  by  various  buses  throughout  the  CPU.  These  buses  are  formed  from  tri¬ 
state  logic  and  some  are  bidirectional . 

A  list  of  the  devices  used  In  the  BDX-930  and  their  failure  rates  is  given  In  Table  1,  obtained  from 
MIL-HDBK217B,  Notice  2. 

2.2  Description  of  tne  Emulator 

The  emulation  Includes  the  components  of  the  CPU  (Central  Processor  Unit),  scratchpad  memory  and  those 
portions  of  the  program  memory  containing  six  target  programs  and  the  target  self- test  program.  The  emu¬ 
lation  Is  derived  from  the  circuit  schematics  of  the  BDX-930  and  Includes  all  of  the  devices  Identified 
in  those  schematics.  Each  device  Is  represented  by  a  gate-level  equivalent  circuit  supplied  by  the  chip 
manufacturer.  It  was  found  that  six  types  of  gates  were  sufficient  to  represent  any  device,  e.g.,  NAND, 
NAD,  OR,  NOT,  NOR,  EXCLUSIVE  OR.  Table  2  gives  the  number  of  equivalent  gates  In  each  device  of  the  CPU. 
In  all,  5,100  gates  were  required. 


21-3 


All  devices  of  the  CPU  were  represented  at  the  gate-level  except  the  following: 

16  general  purpose  arithmetic  registers 
program  memory 
scratchpad  memory 
microprogram  and  control  memories 
which  are  represented  at  the  functional  level. 

The  emulation  did  not  include  the  direct  memory  access  unit  (DMA)  or  any  of  the  devices  of  the  I/O.  The 
emulated  devices  the  CPU  are  shown  in  Figure  1. 

Faults  were  Injected  into  all  devices  except  the  program  and  scratchpad  memories.  Because  the  program  Is 
"read-only",  no  processor,  faulted  or  not.  Is  permitted  to  write  Into  this  memory.  However,  even  though 
the  scratchpad  memory  Is  never  faulted,  a  faulty  processor  can  write  Into  it.  As  a  consequence.  In  the 
parallel  mode  of  operation  where  36  processors  arc  simultaneously  emulated,  the  corresponding  36  scratch¬ 
pad  memories  are  alsu  emulated. 

No  delay  has  been  simulated  between  logic  gates.  T.t  Is  assumed  that  all  combinational  logic  is  stable 
at  the  output  the  Instant  an  Input  pattern  Is  applied  to  It.  This  me-ins  that  each  time  the  Input  Is 
changed,  the  network  need  only  be  evaluated  once  to  supply  the  correct  output  pattern.  Operating  In  this 
manner  Is  very  time  efficient,  but  puts  stringent  requirements  on  the  order  of  evaluation  of  the  gates. 

To  be  able  to  meet  these  requirements,  the  logic  Is  levellzed,  l.e.,  placed  In  groups  or  levels  that  rep¬ 
resent  the  proper  order  of  evaluation. 

The  emulator  utilized  the  parallel  method  of  loqlc  simulation  (see.  for  Instance,  (Seshu,  S.,  et  *1  1962; 
Hardle,  F.H.,  et  al  1967)).  The  data  word  of  a  PDP-10  contains  36  bits;  each  bit  position  Is  used  to  rep¬ 
resent  a  different  machine.  The  simplest  gate  operations  are  represented  by  a  single  Boolean  instruction; 
when  the  two  Imputs  occupy  the  same  bit  positions  In  their  respective  words,  the  output  also  occupies 
this  bit  position.  The  advantage  of  this  technique  is  execution  time  savings.  Typically,  the  amount  of 
code  necessary  to  simulate  36  machines  Is  of  the  same  order  as  the  amount  of  code  necessary  to  simulate 
only  one  machine.  For  an  additional  Increase  in  speed  the  BDX-930  description  Is  contained  In  compiled 
code,  rather  than  In  tables. 

Certain  portions  of  the  machine,  notably  the  memory  elements,  were  represented  at  a  functional  level 
rather  than  a  gate  level.  For  microprogram  memory,  two  words  of  PCP-’O  storage  contain  56  bits  of  micro¬ 
store;  at  micro  memory  fetch  time,  these  bits  are  retrieved  from  the  proper  address  for  each  of  the  sim¬ 
ulated  machines  and  combined  to  form  suitable  words  to  Interface  the  gate  portion  of  the  emulation.  The 
ROM  portion  of  macro  memory  is  handled  In  the  same  manner.  Writeable  store  contains  a  routine  to  translate 
he  gate  Inputs  Into  consecutive  PDP-10  storage  words  so  that  there  Is  one  copy  of  wrlteable  storage  for 
each  machine  being  emulated.  On  reading  this  storage,  the  process  Is  reversed. 

In  a  typical  run  of  the  emulator,  36  different  machines  are  exercised;  35  faulted  machines  and  one  good 
machine,  tjch  faulted  machine  Is  assumed  to  have  a  single  solid  fault  at  one  node,  eltner  stuck-at-one 
(SA1)  or  stuck-at-zero  (SAO).  The  faults  are  Injected  by  defining  e;tra  gates  at  each  node,  an  AND  gate 
for  stuck  at  zero  and  an  OR  gate  for  stuck  at  one.  A  typical  AND  gate  uilng  this  technique  Is  shown  In 
Figure  2. 

An  additional  reduction  In  run-time  can  be  achieved  by  observing  th~  not  all  gate  faults  are  distinguishable 
at  the  gate  output.  For  example,  an  SAO  fault  on  the  input  node  of  an  AND  gate  Is  Indistinguishable  from 

an  SAO  fault  on  the  output  node.  As  a  consequence,  if  two  or  more  indistinguishable  faults  of  the  same 

gate  are  selected,  only  -ne  fault  will  be  emulated. 

It  will  be  noted  that  only  one  partition  of  the  BDX-930  runs  with  faults  Injected  In  each  simulated  run. 

Ihe  remaining  partitions  run'truc  value1,  that  Is  logic  without  fault  Injection  capabilities.  This  re¬ 
sults  in  a  time  saving  In  program  execution.  When  the  entire  emulator  Is  run  true-value,  the  execution 
ratio  between  PDP-10  time  and  simulated  time  is  21  ,000:'',  with  faults  Injected  in  one  partition,  this 
number  is  approximately  25,000:1. 

3.  FAULT  MODELLING  AND  SELECTION 

3.1  Fault  Model 

In  the  present  study  the  following  assumptions  are  made  regarding  failure  modes: 

•  Every  device  can  be  represented,  from  the  standpoint  of  performance  and  failure  modes,  by  the 
manufacturer-supplied,  gate-level  equivalent  circuit. 

•  Every  fault  car.  be  represented  as  elthei  an  S-a-0  or  S-a-1  fault  at  a  gate  node. 

•  The  failure  rate  of  the  device  1.5  squally  distributed  over  the  gates  of  th,>  equivalent  circuit. 

•  The  failure  rate  of  a  gate  Is  i-qually  distributed  over  the  nodes  of  the  gate. 

•  S-a-0  and  S-a-1  faults  are  equally  likely. 

e  Memory  faults  are  exclusively  faults  of  single  bits. 

•  A  memory  fault  Is  the  complement  of  Its  non- faulted  state. 


214 


Faults  are  Injected  Into  all  devices  except  the  main  memory.  In  the  case  of  the  microprogram  memory, 
which  Is  emulated  at  the  functional  level,  faults  are  Injected  Into  the  memory  cells  where  they  remain 
active  for  the  duration  of  the  test.  Faults  are  Injected  at  an  <nput  or  output  gate  node,  and  also  re¬ 
main  active  for  the  duration  of  the  test.  When  a  fault  Is  Injected  at  an  output  node  It  Is  allowed  to 
propagate  to  all  nodes  and  devices  that  are  physically  connected  t.o  the  failed  node.  When  a  fault  Is  In¬ 
jected  at  an  Input  node  It  does  not  propagate  back  to  the  driving  node.  This  strategy  provides  a  wider 
variety  of  failure  modes  than  would  otherwise  be  possible  If  propagation  were  allowed.  The  resultant 
fault  set  Includes  a  rich  assortment  of  static  and  dynamic  (l.e.,  data-dependent)  faults. 

The  above  procedure  does  not  distinguish  between  gate-level  and  component  (l.e.,  pin)-!evel  faults  except 
by  probability  of  occurrence;  the  method  automatically  assigns  failure  rates  to  pins.  However,  a  differ¬ 
ent.  selection  procedure  was  employed  fot  component-level  faults.  For  these  faults  it  was  assumed  that 
the  failure  rate  of  each  device  is  equally  distributed  over  the  pins. 

While  this  assumption  violates  the.  prescribed  fault  model  It  Is  consistent  with  the  conventional  method  of 
estimating  fault  detection  coverage  by  simulating  faults  In  actual  hardware, 

4.  DESCRIPTION  OF  EXPERIMENTS 

4.1  Definition  of  Failure  Detection 

In  the  present  study  fault  coverage  and  latency  estimates  are  obatined  by  employing  two,  conventional 
techniques  of  failure  detection:  comparison-monitoring  and  self-test. 

In  comparison-monitoring  a  set  of  computed  variables  Is  compared  wltn  a  corresponding  set  computed  In 
another  processor.  If  It  Is  arranged  that  both  processors  operate  on  Identical  Inputs  and  are  closely 
synchronized,  then  cny  difference  In  a  computed  variable  signifies  that  one  of  the  processors  has  failed. 

In  practice  each  processor  executes  an  algorithm  which  compares  the  appropriate  variables  and  signals  a 
discrepancy  when  such  exists.  In  the  present  study  this  algorithm  was  omitted;  a  fault  is  considered  to 
be  detected  if  a  difference  between  corresponding  variables  exists  irrespective  of  the  ability  of  either 
processor  to  recognize  the  difference  or  signal  the  discrepancy.  Thus,  the  fault,  coverage  obtained  from 
the  study  Is  somewhat  more  optimistic  than  would  be  obtained  In  practice. 

In  self-test,  on  the  other  hand,  each  component  of  the  processor  is  exercised  by  a  set  of  computations  de¬ 
signed  specifically  to  test  that  component.  The  results  of  each  computational  set  art-  compared  with  pre¬ 
stored  values  and  any  difference  signifies  that  the  fault  was  detected.  In  practice,  and  In  the  study, 
the  processor  increments  a  register  after  the  successful  completion  of  each  test  and  before  proceeding  to 
the  next  test.  If  the  test  is  riot  successful  the  program  exits.  After  an  interval  of  time  equal  to  the 
maximum  time  fo  complete  the  program,  the  contents  of  the  counter  art  decoded.  If  the  value  exactly 
equals  the  total  number  of  tests,  the  fault  was  not  detected.  Otherwise  ths  fault  was  detected. 

4.2  Definition  of  Failure  Detection  Coverage 

We  assume  that  a  test  procedure  Is  given  for  detecting  failures  of  a  component,  C,  Each  failure  mode  of 
C  will  require  a  non-zero  time  for  detection.  By  considering  all  failures  of  C  and  all  combinations  of 
Inputs  and  Internal  statps  of  C,  we  obtain  In  principle.  If  not  in  practice,  a  probability  density  function 
for  tlme-to-detect,  which  Is  measured  from  the  onset  of  the  failure  to  the  time  of  detection.  Denoting 
this  density  by  pdf(x)  where 

t  -  tlme-to-detect  =  latency  time 

we  define 


1  -  ct(x)  *  j  pdf(X)dx 
0 

»  probability  of  detecting  a  failure  of  C  In 
the  interval  0  <  t  <  x. 

Observe  that,  according  to  this  definition,  test  coverage  Is  a  function  of  latency  time.  The  definition 
can  be  extended  to  all  devices  of  the  computer  as  follows: 

Subdivide  the  computer  Into  mutually  exclusive  components  ,  C£ . with  failure  rates  X, ,  X^,  . 

and  test  coverages  1  -  c^(t),  1  -  a2(x),  - 1  -ak(x),  respectively. 

Set  pdf^(x)  *  probability  density  for  tlme-to-detect  failures  of 

Cr  1*1.2 . k. 

Then  the  pdf  for  all  failures  of  the  computer  is 

1=k 

2)  pdf(x)  *  l  M  pdfj(x) 


Test  Coverage 
1) 


1=1 


21-5 


where  X  •  +  Xg  Xk- 

Test  coverage  of  the  whole  computer  Is  then 


3) 

From  3  we  obtain 

4} 


1=k 

1  -  <i(t )  *  T,  Xi_  (1  -  a.(x)  ). 

X  1 

1-1 

1  -K 

a(-r)  =  I  xi_  c^(t)  ,  as  expected 
1»1 


One  of  the  objectives  of  the  present  study  Is  to  obtain  estimates  of  the  probability  density  function, 
pdf(-).  These  estimates  are  presented  in  Section  7. 

4.3  Indistinguishable  Faults  and  Effects  on  Coverage 

During  the  development  of  the  emulator  it  became  apparent  that  a  significant  proportion  of  components  had 
no  affect  whatsoever  on  the  digital  process.  For  the  most  part,  these  components  are  associated  with  un¬ 
used  pins,  e.g.,  a  complementary  output  of  a  flip-flop.  However,  there  are  other  components  whose  lack 
of  effect  are  not  as  obvious  as,  for  example,  a  component  that  only  affects  the  process  when  It  Is  faulted. 
Certain  micromemory  bits  are  In  this  category.  In  order  to  distinguish  between  these  categories  of  faults 
we  are  lead  to  the  following  Informal  definitions: 

A  fault  that  cannot  be  detected  by  any  test  sequence  is  Indistinguishable.  All  other  faults  are 
distinguishable.  . 

Effects  on  Coverage 

The  presence  of  Indistinguishable  faults  can  lead  to  erroneous  and  misleading  estimates  of  cove-age.  In 
theory,  Indistinguishable  faults  should  be  disqualified  from  the  emulation  or  from  the  fault  selection 
process.  This  Is  consistent  with  the  definition  of  coverage  which  implicitly  assumes  that  all  faults  are 
distinguishable.  Unfortunately,  In  order  to  disqualify  indistinguishable  faults  from  the  emulation  or 
from  the  fault  selection  process  they  must  be  first  Identified,  a  non-trlvlal  task.  The  approach  taken 
In  this  study  was  to  select  faults  without  regard  to  their  dlstlngulshabll Ity  properties  and  analyte  only 
those  faults  that  were  undetected  by  Self-Test.  The  proportion  of  indistinguishable  faults  from  this  set 
was  then  used  as  an  estimate  over  all  faults. 


We  now  indicate,  briefly,  how  Indistinguishable  faults  affect  coverage. 


If 

Y  =  proportion  of  components  yielding  Indistinguishable  faults 
and 

1  -  a  -  coverage  of  distinguishable  faults 

then 

1  -  a  =  desired  coverage 
and 

5)  (1  -  a)  (1  -  y)  =  coverage  when  Indistinguishable  faults  are  counted  as  undetected.  We  note. 

Incidentally  that 

fi)  (1  -  a)  (1  -  y)  -t  y  =  coverage  when  Indistinguishable  faults  are  counted  as  detected. 

The  estimate  of  (5)  will  be  obtained  If  Indistinguishable  faults  are  not  disqualified.  Then,  coverage 
estimates  will  be  In  error  by  the  factor,  1  -  y. 

One  of  the  objectives  of  the  experiments  is  to  estimate  t  for  a  variety  of  computations  Including  self¬ 
test.  The  Phase  I  experiments  consist  of  six  ..•'tware  programs  ranging  from  a  simple  fetch  and  store  to 
a  complicated  multi -instruction,  linear  convergence  algorithm.  Using  comparison-monitoring  the  probabil¬ 
ity  distribution  for  t  will  be  estimated  for  each  of  the  six  programs  and  the  Interdependence  of  these 
distributions  and  the  number  and  type  of  Instructions  will  be  ascertained. 

The  Phase  II  experiments  utilize  a  typical  avionics  system  self-test  program  which  consists  of  241 
separate,  sequential  tests.  The.  program  consists  of  2000  executed  Instructions  which  requires  an  execu¬ 
tion  time  of  3  milliseconds  on  the  BDX-930. 


4.5  Phase  I  Experiments 

This  phase  consisted  of  six  programs  each  of  which  was  coded  In  the  assembly  language  of  the  BDX-930. 
For  the  purpose  of  comparison  with  the  experiments  In  (Nagel,  P.,  1978)  the  Instructions  of  the  BDX-930 
were  primarily  restricted  to  the  following  set: 


21-6 


LOAD 

STORE 

ADD 

SUBTRACT 

BRANCH 

In  the  following  programs  the  Initializing  variables  were  stored  simultaneously  In  the  36  copies  of  the 
scratchpad  memories.  The  ground  rules  governing  the  experiments  were: 

o  After  each  repetition  of  the  program  (for  a  randomly  selected  set  of  Inputs)  by  the  non-faulted 
processor  the  output  variables  from  each  repetition  are  compared,  term  by  term,  with  those  of 
the  faulted  processors.  The  first  repetition  which  resulted  In  a  mlscomparlson  Is  referred-to 
as  the  "tlme-to-detect"  or  "latency  time". 

o  At  the  start  of  each  experiment  the  Initial  conditions  for  all  subsequent  repetitions  were 
stored  In  successive  locations  of  memory  and  the  program  counter  was  preset  to  the  address  of 
the  first  Instruction. 


An  experiment  consisted  of  executing  one  of  the  following  programs  In  the  presence  of  a  single  fault. 

a)  FIBONACCI  (FIB) 

Create  a  Fibonacci  series  starting  with  a  pair  of  integers.  Eight  terms  are  generated,  each 
term  constituting  a  repetition. 

b)  FETCH  AND  STORE  (FETSTO) 

Fetch  an  Integer  from  memory  and  store  In  another  location.  This  process  Is  repeated  eight  times. 

c)  ADD  AND  SUBTRACT  (ADDSUB) 

Fetch  two  Integers  from  memory  and  compute  the  difference  and  sum  and  restore  In  memory. 

Repeat  eight  times. 

d)  SEARCH  AND  COMPUTE  (SERCOM) 


Fetch  three  Integers,  A,  B,  C,  from  memory  and  set 

51  -  B  +  C 

52  =»  B  If  8  <  A 

51  =  B  +  C 

52  =  B  -  C  If  A  <  B  and  C  <  A 


S-j  ”  B  -  C 

S2  *  BC  If  A  <  B  and  A  <  C 

and  store  Si  and  S2  In  memory.  Multiplication  Is  performed  by  successive  addition. 


e)  LINEAR  CONVERGENCE  (LINCON) 

A  line,  characterized  by  slope,  M,  and  Y-Intercept,  Y,  Is  given.  A  positive  abscissa,  X,  Is 
selected  at  random.  By  successively  Incrementing  or  decrementing  the  slope,  M,  by  1  the  slope  Is  adjusted 
to  obtain  a  line  that  crosses  the  X-axis  prior  to  X  and  has  a  minimum  deviation  from  the  X-axis.  The  new 
slope  and  ordinate  at  X  are  stored  In  memory  for  comparison, 


f)  QUADRATIC  (QUAD) 

o 

Fetch  four  Integers  A,  B,  C,  X,  from  memory  and  compute  and  store  AX  -  BX  -  C.  Multiplication 
Is  performed  by  repeated  addition.  The  process  is  repeated  four  times.  The  type  and  frequency  of  the 
Instructions  executed  In  the  above  programs  are  given  In  Table  3. 


4.6  Phtse  II  Experiments 

This  phase  consists  of  Injecting  faults  and  executing  a  typical  avionic  flight  control  system 
self-test  program  to  determine  failure  detection  coverage.  The  self-test  program  was  written  expressly 
for  this  study. 


The  task  of  designing  the  self-test  was  given  to  an  experienced  programmer  with  considerable 
expertise  In  self-test.  The  only  requirement  Imposed  was  that  the  resultant  test  should  achieve  a  cover¬ 
age  of  95X.  The  result  was  a  program  consisting  of  2000  executed  Instructions  with  an  execution  time  of 
3  milliseconds.  The  detection  strategy  was  that  of  exercising  every  Instruction  type  at  least  once  and, 
In  most  cases,  with  numerous  variations. 


Sel  f-Test 


The  program  consisted  of  241  subtests.  After  a  successful  completion  of  a  test  the  program  Incre¬ 
ments  a  register  and  proceeds  to  the  next  test  In  the  sequence.  If,  however,  a  failure  Is  detected  the 


21-7 


program  skips  the  remaining  tests  and  transfers  the  contents  of  the  register  to  a  designated  memory  loca¬ 
tion  whose  contents  became  the  measure  of  failure  detection.  In  the  Phase  II  experiments  a  fault  was 

defined  as  detected  If,  after  a  complete  execution  of  the  self-test  program  by  the  non-falled  processor, 

the  contents  of  the  designated  memory  did  not  equal  241  In  the  faulted  processor.  Observe  that,  according 
to  this  definition,  a  fault  Is  detected  If  the  faulted  processor  jumps  out  of  the  program,  gets  hung-up  In 
a  Infinite  loop  or  executes  a  single  extra  Instruction  before  transferring  the  contents  of  the  Incremental 
register  to  memory. 

5.  URN  MODEL 

5.1  Urn  Model  Description 

Several  models  have  been  Investigated  In  an  actempt  to  characterize  the  dynamics  of  fault  propaga¬ 
tion  In  a  digital  computer.  Although  simplistic  In  their  assumptions,  these  models  may,  nevertheless, 
provide  Insight  Into  this  undoubtedly  comp’iexsprocess.  It  has  been  conjectured  (Nagel,  P.,  1978)  that  the 
distribution  of  latency  can  be  modelled  by  analogy  with  balls  In  an  urn.  We  prefer  to  employ  a  different 
analogy  although  the  resultant  distributions  are  the  same. 

We  postulate  that  the  computer  can  be  subdivided  Into  three  sets  of  mutually  exclusive  components 
C-| ,  C 2 1  Cj  such  that 

C,  *  Set  of  components  randomly  exercised  by  the  program 

Cz  1  Set  of  components  continually  exercised  by  the  program 

C3  *  Set  of  components  never  exercised  by  the  program. 

We  make  the  further  assumption  that  a  fault  Is  detected  If  and  only  If  the  faulted  component  Is 
exercised.  The  scenario  Is  that  of  an  avionics  computer  executing  two  software  programs  one  of  which  Is 
executed  full-time  and  the  other,  part-time.  The  components  that  are  exercised  by  the  full-time  mode  are 
denoted  by  Cg  and  those  exercised  by  the  part-time  mode  by  Cj .  Neither  the  full-time  or  part-time  modes 
exercise  components,  C3. 

We  assume  that  the  part-time  mode  Is  exercised  randomly.  If  the  unit  of  time  Is  a  repetition  of  the 
full-time  program  then  we  postulate  that  the  excitation  Is  polsson-dlstrlbuted  In  time  with 

a  *  probability  that  ti,»  part-time  mode  Is  exercised  In  a  repetition  of  the  full-time  program. 

Let  Xi  »  Failure  rate  of  C1  iF.ulures/hour) 

*  Failure  rate  of  C2  (Fallures/hour) 

X3  *  Failure  rate  of  C3  IFallures/hour) 

X  ■  X1  *  x2  +  X3  (Failures/hour) 


We  now  derive  the  latency  distribution  given  that  a  fault  has  just  occurred.  The  distribution  Is 
defined  In  terms  of  three  parameters,  a,  P  and  Qq  where 


7) 

8) 

If 


P  »  Probability  chat  the  fault  Is  detected  In  the  first  repetition  given  that  It  occurred  In  sets 
C1  or  “2 

Qq  *  Probability  that  the  fault  Is  never  detected. 

It  Is  easy  to  derive  the  following  relationships: 


T  + 


.  Qn  ■  T 


^2  +  a  i L 

X  8  X 

^2  .  A1 

X  X 


^2  +  a  — - 

X  8  X 


P  *  probability  that  the  fault  Is  detected  In  the  k-th  repetition  and  not  detected  In  a  previous 
repetition,  k  »  1,  2,  3,  - -  n 

qn+l  ’  Probability  that  the  fault  Is  not  detected  In  the  previous  n  repetitions 


Pi  ■  po  p  ■  f  +  4  f 

P2  ‘  (1  •  p)  a  p0  1  a  (1  ■  a)  y 


9)  P„  *  0  -  P)  0  -  *)n‘Z  a  P0  ■  a  (1  -  a)""1  ~  ,  n  «  2,3,... 

00 

Vi  *  Qo  +  E  pk  ■  %  *  (1  -  p>  po  »  -  *>n_1 

k  ■  n+1 

*  ~  +  0  -  a)n  -1  ,  n  -  1,2,3,... 

Observe  that  . 

n 

qn  +  l  pfc  »  1,  as  expected, 
k  -  1 

In  estimating  the  above  distribution  the  number  of  repetitions  Mill  be  limited  to  eight.  Then,  the 
study  Mill  estimate  the  quantities 

Pi .  p^*  ....  Pg,  qg 

for  S-a-1 ,  S-a-0  and  combined  faults. 

6.  STATISTICAL  ANALYSES 

6.1  Estimators  for  Self-Test  Coverage 

The  estimators  for  x,  y  and  z  are 


10)  »•  .  J 


11)  y* 


12)  Z* 


nd  +  "d 


x  (y.  z)  *  probability  that  a  S-a-0  (S-a-1,  combined)  fault  It  detected; 
md  (n<^  "  number  of  S-a-0  (S-a-1)  faults  detected; 
m(n)  ■  number  of  S-a-0  (S-a-1)  faults  Injected. 

6.2  Estimators  for  Latency 

The  estimators  for  xk,  yk  and  z^  are 


k  m 


13>  v  ■  f 


m.  .  n. 

zk*  *  jTn"  •  k  "  1-2.3 . 8, 

xk  ^yk*  zk^  *  Probab<11ty  that  8  S-a-0  lS-a-1  combined)  fault  Is  detected  In  the  k-th  repetition; 
mk  (nk^  "  number  of  S-a-0  (S-a-1)  faults  detected  In  tiie  k-th  repetition. 


■  . 


:.i-9 


With  some  abuse  of  terminology  we  define 

xq  (y0,  Zq)  ■  probability  that  a  S-a-0  (S-a-1 ,  combined)  fault  Is  not  detected  In  the  previous 
8  repetitions. 

The  estimators  for  xg,  yg  and  zg  are 
in  *  IDs  *  nu  *■  » • «  ■  wp 

x9  n  1  X1  x2  *  *  *  x8 


14)  y9* 


n  -  n^  -  n?  -  ...  -  ng 


1  -  y/  -  y2*  -  ...  -  y8* 


m  Xq*  ♦  n  y„* 

V  *  — ITT-1-  "  1  -  V  * 


-  z 


8  ‘ 


6.2  Estimators  for  Urn  Model  Parameters 

The  method  of  estimation  will  be  described  for  S-a-0  latency  distributions.  With  an  obvious  change 
1  r.  parameters,  e.g'.  mk,  the  estimates  can  be  applied  to  S-a-1  and  combined  latency  distributions,  as  well. 

The  method  Is  based  on  the  principal  of  maximum  likelihood.  We  note  that  mk  S-a-0  faults  are 
detected  In  the  k-th  repetition.  Accordingly,  we  seek  Urn  Model  parameters  a,  P  and  Pq  that  maximize  the 
likelihood  function. 


L 


m«  mn  mQ 

»j2  ■■■  h  V 


where 

Pi  *  P.  p 

p2  *  (1  -  P)  a  PQ 

15)  p3  *  (1  •  P)  a  PQ  (1  -  a) 


P8  *  11  -  P)  a  P0  0  -a)6 
q9  *  Q0  +  0  -  P)  P0  (1  -  »); 
and  mg  •  m  —  m^  —  m2  —  ...  —  mg . 


We  note  that  qg  corresponds  to  xg  of  Section  6.2. 

The  maximum  likelihood  estimators  for  a,  P  and  P0  are  obtained  as  the  solution  of 


6. A 

16) 

and 


17) 


Accuracy  and  Confidence  of  Coverage  Estimates 
It  can  be  shown  (McFarlane,  M.  A.,  1950)  that 
E  (x*)  -  x,  E  ly#)  «  y,  E  (z*)  -  z 


l  (  (x  -  x*)2 
E  (  (y  -  y* )2 

£  (  (z  -  z*)2 


)  ,  X.  11  Xj 

'  m 

)  .  i  (i  jud 

'  n 

)  .  z  (1 

'  m  +  n 


where 


E  (•)  ■  expected  value  of  (•)• 

for  m,  n  sufficiently  large  the  estimators  x*.  y*  and  z*  are  approximately  Gaussian  with  means  und 
variances  given  by  (16)  and  (17),  respectively. 


21-10 


The  following  derivation  of  accuracy  and  confidence  Is  general  and  applies  to  any  quantity,  x, 
estimated  by  the  method  of  Section  6.2  As  before, 

x*  «  estimate  of  x 

m  »  sample  size. 

It  Is  well-known  (see  ref.  3)  that  the  probability  that  x  lies  between  the  limits 


or,  equivalently,  that  x*  lies  between  the  limits 


18)  x  +  X  s/  — 

—  J  m 

Is  equal  to  X,  where  \  Is  the  area  of  the  standard  Gaussian  distribution  between  -X  and  X.  Froir,  (18)  we 
may  say  that  the  error  In  the  estimate,  x*,  is 


19)  e  . 

with  a  confidence  level  of  y. 

Equation  (19)  Is  an  ellipse  In  x.  Table  4  gives  a  tabulation  of  e^nT  versus  x  for  a  confidence 
level  of  «  .95. 

It  Is  often  convenient  to  obtain  error  estimates  that  are  Independent  of  x.  from  (19)  It  can  be 
seen  that  the  maximum  error  occurs  when  x  ■»  1/2.  Table  5  gives  a  tabulation  of  this  maximum  error  versus 
sample  size  and  confidence  level.  It  Is  noted  that  the  maximum  error  can  be  extremely  conservative. 

7  RESULTS  OF  EXPERIMENTS 

7.1  Distribution  of  Faults 

InUlally,  1  ,000  gate-level  and  400  component-level  faults  were  randomly  selected.  Later,  In  order 
to  reduce  the  cost  of  the  runs  It  was  necessary  to  reduce  the  number  of  faults  actually  Injected.  The 
number  of  faults  finally  selected  for  each  experiment  are  given  In  Table  6. 

7.2  Phase  I  Experiments 

The  results  of  the  Phase  I  experiments  are  summarized  In  Tables  7  and  8. 


Table  7 


This  table  shows  the  breakdown  of  faults  Injected  versus  faults  detected  In  each  of  the  six  oro- 
grams.  Also  shown  Is  the  percentage  of  undetected  faults  after  completion  of  the  specified  number  of 
repetitions  of  each  program. 

Table  8 


This  table  gives  the  maximum  likelihood  estimates  of  a,  P  and  Pp,  as  defined  In  Section  7.  Also 
shown  are  the  resultant,  computed,  Urn  Model  distribution  In  terms  of  the  occupancy  probabilities  of 
cells,  1,  2,  .....  8.  These  correspond  to  the  probabilities  xj ,  y^  or  z^  for  S-a-0,  S-a-1  and  combined 
faults,  respectively.  In  keeping  with  our  previous  notation,  the  occupancy  problllty  of  cell  9  Is  actually 
the  probability  that  the  fault  Is  undetected  In  the  previous  8  repetitions.  As  a  comparison,  the  corres¬ 
ponding  empirical  distributions  are  also  given. 

Figure  3a  through  5b  show  histograms  of  detected  faults  versus  repetitions  to  detection  for  com¬ 
bined  (l.e.,  S-a-0  and  S-a-1)  faults  at  both  the  gate  and  component-levels.  Superimposed  on  each  histo¬ 
gram  Is  the  distribution  of  the  corresponding  Urn  Model. 

7.3  Phase  I!  Experiments 

Indistinguishable  Fault  Estimates 

In  order  to  obtain  an  estimate  of  the  proportion  of  Indistinguishable  faults  each  resultant, 
undetected  fault  was  analyzed  and  those  faults  which  were  obviously  Indistinguishable  were  disqualified. 

At  the  gate-level,  71  out  of  300  faults  were  Identified  as  Indistinguishable  and,  at  the  component-level, 

11  out  of  200  were  Identified  as  Indistinguishable.  Thus,  the  estimated  proportion  of  components  yield¬ 
ing  Indistinguishable  faults  are: 


21-11 


y*  ■  *  0.2366  ai  the  gate  level 

and  y*  *  =  .055  at  the  component-level 

Since  Indlscinquishable  faults  were  not  disqualified  in  the  Phase  I  experiments  all  coverage 
estimates  of  Phase  1  should  be  multiplied  by  the  appropriate  ’  -  y *  factor,  as  prescribed  In  Section  7. 

Self- Test  Coverage 

After  disqualifying  71  indistinguishable  faults  229  faults  were  effectively  Injected  at  the  gate- 
level  and  189  at  the  component-level.  The  resultant  raw  data  Is  given  In  Table  9  by  partitions. 

As  indicated  previously,  a*ter  each  Injected  fault  the  self-test  program  was  executed.  Faults 
were  generally  detected  either  because  an  explicit  test  detected  the  fault  or  the  fault  caused  a  jump  out 
of  the  program.  These  latter  faults  are  denoted  in  Table  9  by  "wild  branches". 

Summary  of  Results  of  Phase  II  Experiments 

Gate-Level  Faults 

e  198  out  of  229  combined  faults  were  detected  for  a  coverage  of  86.46%. 

•  100  out  of  114  S-a-1  faults  were  detected  for  a  coverage  of  87.72%. 

•  98  out  of  115  S-a-0  faults  were  detected  for  a  coverage  of  85.22%. 

•  9  out  of  17  faults  In  Partition  #5  were  detected  for  a  coverage  of  52.94%. 

•  5  out  of  8  faults  In  Partition  #6  were  detected  for  a  coverage  of  62.5%. 

•  If  faults  In  Partitions  #5  and  #6  are  disqualified  then  184  out  of  204  faults  were  detected  for 
a  coverage  of  90.2%. 

a  103  out  of  the  198  faults  detected  resulted  in  wild  branches,  i.e.,  52%. 

•  95  faults  were  detected  by  an  explicit  test  (even  though  it  was  not  always  possible  to  Identify 

the  test) 

•  Out  of  the  241  possible  tests,  at  most  46  actually  resulted  in  a  detection,  i.e.,  most  of  the 
tests  were,  effectively,  redundant. 

Component-Level  Faul ts 

•  185  out  of  189  combined  faults  were  detected  for  a  coverage  of  97.9%. 

•  97  out  of  100  S-s-1  faults  were  detected  for  a  coverage  of  97%. 

•  88  out  of  89  S-a-0  faults  were  detected  for  a  coverage  of  98.9%. 

•  106  out  of  189  faults  detected  resulted  in  wild  branches,  I.e.,  56%. 

•  79  faults  were  detected  by  an  explicit  test  (even  though  It  wss  not  always  possible  to  Identify 
the  test) 

•  Out  of  241  possible  tests,  at  most  44  actually  resulted  in  a  detection. 

8.  SUMMARY  OF  RESULTS  OF  EXPERIMENTS 

8.1  Pha-e  I  Experiments 

•  Most  detected  faults  are  detected  in  the  first  repetition.  Subsequent  repetitions  do  not 
appreciably  Increase  the  proportion  of  detected  faults. 

a  S-a-1  faults  are  easier  to  detect  than  S-a-0  faults. 

•  The  micromemory  contains  a  large  proportion  of  1ndist1nguishab1e  faults 

•  A  large  proportion  of  faults  remain  undetected  after  as  ,.any  as  8  repetitions 

•  Component-level  faults  are  easier  to  detect  than  gate-1  ev;.l  faults 

t  The  coverage  estimates  of  the  Phase  !  experiments  are  not  corrected  for  indistinguishable  fault 
content. 

Subsequent  analysis  of  undetected  faults  indicates  that  the  proportion  of  indistinguishs  le  faults 
at  the  gate-level  is  23.66%  end  5.5%  at  the  component-level.  The  combined,  S-a-1  and  S-a-0  coverage 
estimates  should  be  corrected  by  dividing  the  raw  coverage  by  1-y*  where 

1  -  y*  »  .7633  for  gate-level  coverage 

--  .945  for  component-level  coverage. 


21-12 


The  poor  detection  coverage  of  the  six  programs  of  Phase  I  Is  not  surprising  particularly  If  one 
considers  that  Self-Test,  which  exercises  a  much  greater  mix  and  quantity  of  instructions,  achieves  Bo. 5* 
detection  (at  the  gate-level).  Table  10  shows  the  Instruction  mix  and  quantity  of  Instructions  executed 
ve"su  coverage  for  each  of  the  six  programs.  By  contrast,  Self-Test  exercises  almost  the  entire  Instruc¬ 
tion  set  of  the  CPU  and  executes  approximately  2000  Instructions  In  a  single  pass. 

8.2  Phase  II  Experiments 

e  There  Is  a  significant  difference  In  coverage  of  gate-level  versus  component-level  faults,  e.g., 
after  disqualifying  Indistinguishable  faults  gate-level  fault  coverage  was  86. 511  whereas 
component-level  fault  cnverage  was  97. 9*. 

e  There  was  a  large  proportion  of  Indistinguishable  faults  In  the  gate-level  emulation,  e.g., 

23. 7*.  The  worst  offender  was  the  micromemory  which  yielded  33  Indistinguishable  fault  out  of 
a  total  uf  41  selected. 

a  Only  4811  of  all  detected  faults  were  detected  by  an  explicit  tes*,  i.e.,  95  out  of  198.  103 
faults  were  detected  because  the  fault  resulted  In  a  wild  branch,  I.e.,  a  jump  out  of  the  first 
test. 

e  Most  of  the  241  tests  comprising  Self-Test  were  redundant;  only  46  tests  resulted  In  a  detection. 

a  Of  the  95  faults  detected  by  an  explicit  test  59  were  detected  by  the  first  23  tests. 

a  This  particular  Self-Test  was  designed  to  exercise  an  Instruction  set  rather  than  explicit  hard¬ 
ware.  As  noted  In  Section  10,  this  approach  results  In  an  Inefficient  Self-Test  since,  It 
turned  out,  most  of  the  tests  exercised  the  same  hardware. 

8.3  Urn  Model  Distributions 

From  previous  studies  and  results  of  experiments  we  make  the  following  observations  regarding  the 
Urn  Model . 

a  Despite  Its  simplicity  the  Urn  Model  results  In  good  correlation  with  all  of  the  empirical  dis¬ 
tributions  of  the  study.  This  Is  not  surprising  considering  that  the  model  has  3  degrees-of- 
freedom  available  for  a  best  fit,  I.e.,  P,  PQ  and  a,  and  the  empirical  distributions  are  heavily 
weighted  In  the  first,  second  and  last  latency  cells. 

9.  CONCLUSIONS 

On  the  basis  of  the  study  we  conclude: 

e  Emulation  Is  a  practicable  approach  to  failure  modes  and  effects  analysis  of  a  digital  processor. 

e  The  run  time  of  the  emulated  processor  on  a  PDP-10  host  computer  Is  only  20,000  to  25,000  times 
slower  than  the  actual  processor.  As  a  consequence  large  numbers  of  faults  can  be  studied  at 
relatively  little  cost  and  In  a  timely  manner. 

e  The  fault  model,  although  somewhat  arbitrary,  can  be  updated  as  more  data  becomes  available. 

e  Gate-level  equivalent  circuits  are  available  for  digital  devices  Including  the  2901A. 

e  Gate-level  faults  are  more  difficult  to  detect  than  component-level  faults. 

e  A  computer  self- test  program  of  the  order  of  2000  executable  Instructions  can  detect  98*  and 
possibly  99  or  100X  of  component-level  faults.  The  feasibility  of  detecting  the  same  proportions 
of  gete-level  faults  .emalns  to  be  determined. 

e  Emulation  can  be  an  Important  tool  In  the  design  of  an  efficient  self-test. 

e  In  a  comparison-monitored  system  the  accumulation  of  latent  faults  can  be  significant.  In  the 
study  the  proportion  of  undetected  faults  after  8  repetitions  ranged  from  40  to  62*. 

e  For  the  range  of  values  considered  the  proportion  of  undetected  faults  after  8  repetitions  Is  a 
linear  function  of  the  number  of  executable  Instructions. 

e  With  a  suitable  choice  of  parameters  the  urn  Model  can  be  used  to  describe  fault  latency  In  a 
comparison-monitored  system. 

e  Faults  In  the  micromemory  are  difficult  to  detect. 

e  In  a  comparison-monitored  system  most  detected  faults  are  detected  In  the  first  repetition  of 
the  program.  Subsequent  repetitions  do  not  appreciably  Increase  the  proportion  of  detected 
faul ts . 

e  A  gate-level  emulation  of  a  real  processor  may  contain  a  large  proportion  of  Indistinguishable 
faults.  Identifying  such  faults  is  difficult. 

e  Only  48*  of  all  detected  faults  were  detected  by  an  explicit  subtest  of  Self-Test;  52*  were 
detected  because  the  fault  resulted  In  a  wild  branch. 


21-13 


Concluding  Remarks 

Th(-  outcome  of  this  study  was  no  less  surprising  or  Intriguing  then  the  results  of  the  Nagel  pilot 
study.  Most  of  the  data  generated  In  the  Nagel  study  were  essentially  duplicated  In  this  study  which  In 
Itself  Is  remarkable  because  of  the  two  "very  different"  hardware  processors  used  In  the  studies  (see 
Table  11  for  a  comparison).  A  significant  finding  of  this  work  which  correlates  well  with  Nagel's  obser¬ 
vations  Is  that  comparison-monitoring  yields  a  detection  coverage  which  rcnges  from  40  to  60  percent  end 
Is  In  sharp  contrast  to  assumed  values  of  unity  for  first  failure  coverage  In  comparlslon-monltorlng  c •> 
majority-voting  (cm/mv)  systems.  Admittedly,  the  presence  of  undetected  faults  does  not  of  and  In  itself 
constitute  computer  failure,  but  It  does  cast  doubt  on  the  validity  of  state-of-the-art  reliability  assess¬ 
ments  and  causes  one  to  wonder  what  those  latent  faults  are  doing  in  the  computer. 

Another  Important  finding  of  this  study  relates  to  the  question  of  where  faults  should  be  Induced, 
at  the  gate  or  pin  (functional)  level,  to  evaluate  self-test  computer  programs.  The  study  shows  a  wide 
dispersion  In  results  between  the  two  methods,  l.e.,  8/-percent  gate  level  versus  98-percent  pin  level. 

The  Issue  Is  far  from  trivial  because  the  proponent  of  pin-level  testing  can  argue  that  the  11  percent  that 
did  not  get  detected  by  the  pin-level  method  are  don't-care  faults  anyway,  whereas  the  proponent  for  the 
gate-level  method  argues  that  11  percent  of  the  faults  are  still  present  In  the  computer  and  may  be  mani¬ 
fested  at  a  most  inopportune  time. 

Currently  It  Is  simply  not  clear  what  Impact  latent  faults  could  have  In  digital  computers  and  their 
possible  effects  on  fault-tolerant  computer  fault  detection.  The  reliability  analyst  must  be  conservative 
when  he  cannot  be  accurate,  so  these  findings  must  have  a  negative  impact  on  reliability  predictions  for 
cm/mv  systems,  l.e.,  detection  values  based  on  gate-level  fault  Injection  must  be  used  In  reliability  pre¬ 
dictions  In  lieu  of  the  pin-level  value.  Furthermore,  these  results  strongly  suggest  a  more  conservative 
approach  to  fault,  detection  In  fau' t-tol erant  systems  utilizing  cm/mv  detection  schemes.  One  approach  to 
enhance  detection  would  be  to  employ  both  cm/mv  detection  and  concurrent  periodic  self-test. 

Finally,  the  vehicle  and  Its  application  which  made  these  results  possible  deso-ve  sp*M  l  emphasis. 
Although  gate-level  simulation  Is  not  new,,  the  approach  used  In  this  study  makes  practical  the  generation 
of  coverage  data  particularly  for  cm/mv  schemes  and  opens  up  a  new  horizon  of  uses  fcr  such  a  tool,  some 
of  which  were  explored  and  reported  on.  Its  use  for  designing  efficient  self-test  code,  Identification  of 
Indistinguishable  faults,  a  practical  approach  to  failure  modes  and  effects  analysis,  and  fault  analysis 
In  general  are  just  some  that  come  to  mind.  In  summary,  gate-level  simulation  most  assuredly  will  become 
an  essential  tool  to  design  and  reliability  engineers. 

10.  RECOMMENDATIONS  FOR  FUTURE  STUDIES 

e  The  Phase  I  experiments  should  be  repeated  using  flight  critical,  fi  ght  control  computations. 

The  Instruction  set  should  not  be  limited  as  It  was  In  the  present  study.  Additional  tasks 
would  Include 

•  Determination  of  t\,  proportion  of  faults  that  affect  the  control  surfaces. 

a  Determination  of  the  proportion  of  faults  that  prevent  failure  detection  In  the  faulted 
processor. 

a  Investigate  other  methods  of  faul ;  wStectlu  :  r.nch  as  the  use  of  redundant  computations  In  a  non- 
redundant  processor  In  a  flight  critic.. I,  fllgnt  control  application. 

a  Investigate  the  feasibility  of  extending  the  emulation  to  I/O  Interface  devices  such  as  AD  and 
DA  converters,  I/O  controllers,  etc. 

e  Generate  more  realistic  fault  models.  Perhaps  manufacturers  could  be  prevailed  upon  to  supply 
equivalent  circuits  that  are  more  closely  correlated  with  failure  modes  as  well  as  with 
performance. 

a  Develop  a  more  realistic  Urn  Model.  The  resultant  model  could  be  an  Important  tool  In  reliabil¬ 
ity  modelling  of  a  redundant  system. 


REFERENCES 

1.  Nagel,  P.,  "Modeling  of  a  Latent  Fault  Detector  in  a  Digital  System,"  Vought  Corporation,  Nasa  contract, 
NASI -1 3500 ,  August,  1978. 

2.  Seshu,  S.  and  Freeman,  D.  N.,  "The  Diagnosis  of  Asynchronous  Sequential  Switching  System,"  IRE  Trans 
actions  on  Electronic  Computers.  Vol.  EC-11  No.  4,  August,  1962,  pp.  459-465. 

3.  Hardle,  F.  H.,  and  Suhockl ,  R.  J.,  "Design  and  Use  of  Fault  Simulation  for  Saturn  Computer  Design," 

IEEE  Transactions  on  Electronic  Computers,  Vol.  EC-16,  No.  4,  August,  1967,  pp.  412-429. 

4.  Cramer,  H.,  Mathematical  Methods  of  Statistics.  Princeton  University  Press:  Prlncetlon,  1958. 

5.  McFarlane  Mood,  A.,  Introduction  to  the  Theory  of  Statistics.  McGraw-Hill:  New  York,  1950. 


coM*mn  coNTnoi  tjmr 


ajiiTh  nioclMMQ  U«I1  i*r\n 


PIQURK  1  PROCESSOR  ARCHITECTURE 


21-16 


TMIC  1  COWONCKTS  OF  THE  10X930  CFO 


FAILURE  RATE/FCR 

ms. 

WIT 

MCMCIICum  AM  EQUIVALENT  NATE  CQWT 

JH1 _ 

2901A 

2.ICS6 

2902 

0.3898 

OEVICC 

EQUIVALENT  WTO 

5440 

0.0653 

54125 

0.0655 

2901 A 

799 

54155 

0.1448 

54175 

0.1755 

2902 

19 

54S00 

0.0855 

54S04 

0.1003 

5411) 

1 

54S06 

0.084 

54S10 

0.0764 

54151 

17 

54520 

0.0654 

54532 

0.2138 

54153 

16 

545151 

0.1483 

54S288  (32x9  prw) 

0.1787 

54159 

15 

54S472  (512x9  prow) 

1.009 

54LS00 

0.084 

54169 

59 

54LS02 

0.0(14 

541504 

0.09  a>3 

54175 

22 

5415: 08 

0.084 

54LS11 

0.0752 

54245 

19 

54LS32 

0.084 

54LS86 

0.0M 

54253 

16 

54LS113 

0.1447 

541S151 

0.1483 

54273 

34 

S4LS153 

0.1447 

S41S159 

0.1410 

54352 

16 

5415165 

0.6603 

54151 75 

0.1703 

54374 

26 

5415245 

0.3792 

5415253 

0.1447 

54377 

35 

5415257 

0.1636 

5415273 

0.6382 

9407 

143 

5415283 

0.2681 

5415352 

0.3117 

5415367 

o.noo 

5415374 

0.7234 

54LS377 

0.7148 

NOTE:  Ttw  rMlicannt  for  th»  FOOT  1.cli-4*<  tho  follwl*v  Onlcn: 

V4LSQ0.  ML'IM .  54L5TIJ,  MU|7S,  SAlSlJA  l  5415357. 


VMLE  4 

Error  Clltp  •  for  «  Coofttooca  Cm)  of  t  •  .N 


t/T  •  J  «  (I  -  «) 


TAIU  3  TYPE  Flin  FNEggNCT  OF  1ASTRUCTICPB  EXECUTED 


Bn 

1  ftTSTO 

a 

HG5H 

L0At> 

2 

1  , 

2 

r-rn 

m m 

STORE 

1 

1  T 

2 

7 

* 

_ 

L* 

ADD 

3 

g  ■ 

2 

17 

16 

SHTI’ACT 

a 

55 

— 

1 

4 

1 

BRANCH 

i.i 

55 

24 

39 

39 

TRANSFER 

0 

i 

i  .. .  . 

2 

5 

11 

6 

CLEAR 

1 

i _ 

1 

TAIi.E  5 

HXItM  tXXOO  9t*5U5  SWF Lf  SIZE  MO  cm  FIDE  PCI  LEVEL 


x - OTHT~ 

SIZE 

COUFIOGI^n. 
UEVli.  \ 

200 

300 

— 

400 

600 

1000 

.6 

0J 

.026 

.021 

,C17 

.01) 

-  .7 

.037 

.03 

,02E 

.021 

.017 

.9 

.046 

.038 

.033 

.027 

,01) 

,044 

.041 

.034 

.011 

_ JS _ 

.  jg .. 

,0«V 

.04 

-■ML. 

re 

ft 

0.0 

0 

.*27 

.09 

.599 

.1 

.70 

.15 

.*84 

.2 

.949 

.29 

.999 

.3 

.935 

.36 

.HO 

.4 

.975 

41 

.99 

.8 

.975 

.If 

M 

.6 

.9)5 

.66 

899 

.7 

.249 

.71 

.714 

.1 

.7 

M 

.598 

.9 

.427 

.96 

0.0 

1.0 

TAIU  I 

imw  of  rMn  intern 


UWIMHT 

bte-ldel 

fflammak 

FtTSTO 

1000 

400 

ADCS« 

600 

400 

Ftf 

600 

400 

QUAD 

ota 

400 

SEKOH 

1000 

400 

LhC06 

600 

000 

3E!,F-TE5T 

300 

too 

21-17 


— 

_ L-JUU 

ki'ii 

u _ 

~ 

-fabi»A4_- 

L 

MJ 

flt'M _ 

— “;r 

■  MMUW 

airman; 

■cm 

4 

KWM 

emcTw  ««a»* 
Hi  iwmm 

!W9fi _ 

iwitm 

»rttm 

jtkhi 

»•« 

»  mmm. 

•»U»T  1 

wtromt  | 

isiMtwi  r 

1 

■NTTHMI 

IUKTT1 

inm 

KTtCTtt 

lit 

■KTlTUi 

Kicnt 

mnrrrti 

iwtto 

KTiua 

•Wfc ' 

MTirrw 

>b« 

mcr.f 

miriro 

r\TM«(0)l 

101 

()• 

h.->,  T 

h— 

4t» 

1,1 

a  • 

(» 1 

*"»«! 

•1 

44T 

40  .  \ 

>•> 

ni 

II.) 

-  i 

m 

I* 

1*  4  [  *4.J 

n/ 

hi 

»*•  ! 

•«wj 

it* 

»•• 

r  1  — 

ir.i 

m 

Uu.  , 

II  1 

ii.i 

'  "*  (I?1 

)M 

« 

it  *  1  a  i 

110 

M.l 

fU  ,„i 

,w  i 

HI 

It.l 

11.4 

ns 

114 

61. • 

<i.) 

1 

8i> 

... 

14,4  |  4.*4 

4*r 

r  -.r 

%y.t 

i-sji 

1 

til 

14* 

41.1 

,c, 

>1.1 

■ii.i 

;  *■* 

III 

lit 

ll.l  i  4T.4 

>•* 

(41 

...» 

41) 

*•  i*i 

III 

If.) 

It.t 

«> 

II* 

M.l 

r».» 

u"*o)! 

1(1 

444  |  It. 3 

L=J 

141 

...  j 

lit 

14.1 

ll.l 

«« 

144 

>11 

SUffftRt  OF  FHASt  I  REMITS 
TAKE  7 


I'lwct 

£  •  L«i>  1  r  1 1  j  I  y.i  l  u 


1  y|<.»m  /  H  1 1 


FI 6/ CATE 


U » 

P 

P.» 

1.  11- 
1 

li'il* 

l..  il* 

•  < 

\ .  .  1  4 

.  1 

till* 

f  -I 

*:  11* 

-  I.-U* 

•i 

Cl  11* 

C'l  )'• 

»9 

CQMKtIlHO 

1 

0.5*14 

0.4.':  6 

0.7/76 

O.  J;l4'» 

0.  1 

U.t“  1 

U.O.'l 

O.Wtf 

Q.Cu/ 

(I .  'JO! 

J.U<i2 

u.oni 

0.617 

A 

0  .N‘l 

0 ,  l<  l.( 

o.a» 

0 

0 

0.006 

U .  IMT 

0.C  )* 

0.617 

S.f# 

T 

0.816H 

C.  4.1.2 

o.vti 

0.3*47 

0.2/0 

0.1*44 

0.0  4 

0.012 

0.  Mi  Hi 

0.6'  1 

0.  rn:» 

0  1 

U.*16» 

A 

0.270 

o.rni 

0  e;» 

0 

0 

0.00' 

0 

0.1*' >4 

C.-  * 

S-j.J 

T 

o.tur 

0. 3.4*3 

0.liU*»'J 

0.4044 

0.  >2.1 

o.ir.o 

0.01.1 

0,011 

a.u»»’ 

0.004 

0.11)3 

0.1)02 

05974 

A 

0.  3.’ 8 

0.082 

0.0u4 

0 

0 

U.  004 

0.004 

0.010 

0.593 

COHBlAtO 

T 

O.i.'IO 

0.4. VO 

0  >92/ 

0 .  Im.I>5 

0.5128 

C.063 

0.033 

o.cia 

0.009 

0,005 

0.00i 

0.0m 

0.3550 

A 

0.  *13 

0  036 

O.P-5 

0.O03 

0 

0 

0.01  ) 

0.000 

0.355 

S-i-l 

I 

0.6603 

0.3397 

0.7490 

0.60r>6 

0.467 

0.047 

0.0)1 

0.021 

0.014 

0.009 

0.006 

0.004 

0.4010 

A 

0.447 

0.061 

a. 036 

0.005 

0 

0 

0.025 

0.005 

0.401 

S-i-1 

T 

0-3694 

0.1406 

0.8070 

0.4898 

0.557 

0.085 

0.0J1 

0.011 

O.OOi 

0.001 

0.001 

0.000 

0.3103 

A 

0.557 

0.108 

0.015 

O 

0 

0 

0 

41.010 

0,310 

COHBIHCO  T 

0.31 71 

0.5*2  3 

0.8364 

0.4184 

0.350 

0.047 

0.015 

0  005 

0.001 

0 

0 

0 

0.5016 

A 

0.350 

0.055 

0.007 

0.P02 

0.002 

0.002 

0 

0.002 

0.582 

$-•-6 

T 

0.2909 

0.7041 

0.8035 

0.3784 

0.JO4 

0.053 

0.015 

0.004 

0.001 

0 

c 

0 

0.6214 

A 

0.304 

0.044 

0.003 

0.003 

0 

0 

0 

0.003 

0.422 

S-i-1 

T 

3.3444 

0.4534 

0.8432 

0.4573 

0.395 

o.wt 

0.318 

0.005 

0.002 

0.001 

0 

0 

0.5427 

A 

0.395 

r.046 

0.0)0 

0 

0.003 

0.003 

0 

0 

0.543 

COMB  MED  T 

0.2958 

0.7042 

0.8507 

0.7200 

0.413 

0.074 

0.022 

0.007 

0.002 

0.001 

0 

0 

0.280 

A 

0.413 

0.090 

0.008 

0.003 

C  003 

0.003 

0 

0.003 

0.280 

S-i-l 

T 

0 

1.000 

0.8473 

0.4450 

0.54)5 

0.1015 

0 

0 

0 

0 

0 

0 

0.335' 

A 

0.543 

0.102 

0 

0 

0 

0 

0 

0 

0.335 

s-i-1 

T 

0.444* 

0.5532 

0.8531 

0.7738 

0  640 

0.043 

0.028 

0.013 

0.004 

0.003 

0.001 

0.001 

0.2244 

A 

0.440 

0.079 

0.015 

0.005 

0.005 

0.005 

0 

0.005 

0.227 

CCM6MED  T 

0. 5304 

0.4441 

0.8254 

0.4058 

0.32$ 

0.033 

0.018 

0.009 

0.005 

0.003 

0.001 

0.001 

0.585C 

A 

0.335 

0.02’ 

0.030 

0.003 

0.005 

0.003 

0.002 

0 

0.595 

J-M 

T 

0.4051 

0.3444 

0.7874 

0.3404 

0.284 

0.030 

0.018 

0.011 

C.007 

0.004 

0.002 

0.001 

0.441? 

A 

0.284 

0.024 

0.03* 

0 

0.007 

0.007 

0.003 

0 

0.442 

S-i-1 

T 

A. 053 

0.5447 

0.0534 

0.4509 

0.385 

0.037 

0.016 

0.007 

0.003 

0.001 

0.001 

0 

0.5493 

A 

0.385 

0.030 

0.024 

0.00? 

0.003 

0 

0 

0 

C  Ml 

combi  mo 

T 

0.  3401 

0,4544 

0.44)3 

0.1771 

0.570 

0.071 

0.024 

0.008 

0.003 

0.001 

0 

0 

0.3225 

A 

0.570 

U.06A 

0.0*5 

0 

O 

0.  lM‘% 

.1 

0 

S-l-9 

T 

0.3153 

our 

0. 8044 

0.1291 

0.508 

0.083 

0.0?» 

0.OO8 

0.303 

0.00. 

0 

0 

0.3705 

A 

0.506 

0.081 

0.036 

0 

0 

0.005 

0 

0 

0.371 

S-i-l 

T 

0.3543 

0.4307 

0.1704 

0.7242 

0.431 

0.059 

0.027 

0.00* 

0.003 

0.001 

0 

0 

0.2759 

A 

0.431 

0 .956 

0.014 

0 

< J 

0.009 

0 

0 

0.274 

im«  moocl 

MC  FAHAHEU.*  tSTIHATCS 


m  WXXl  MTOItUTIOHS 
UTS  WWAHUER  ESTATES 


TABU  4 


TMU  B 


Hierarchical  Specification  of 
the  SIFT  Fault  Tolerant  Flight  Control  System 


P.M,  Melliar-Smlth  and  Richard  L.  Schwartz 

Computer  Science  Laboratory 
SIU  International 
Menlo  Paik.CA  94025 


Abstract 

This  paper  describes  work  in  progress  at  SKI  on  the  specification  and  mccnanical  verification  of  the  Software  Implemented  Fault 
Tolerance  (SIFT)  flight  control  system.  The  methodology  employed  in  the  verification  effort  is  discussed,  and  a  description  of  the 
hies  archival  models  of  the  SIFT  system  is  given. 


Introduction 

To  meet  the  objectives  of  NASA  for  the  reliability  of  safety-critical  fiiglu  control  systems,  the  SIFT  computer  must  achieve  a 
reliability  well  beyond  the  levels  at  which  reliability  can  be  actually  measured.  'Ibis  paper  describes  the  methodology  employed  to 
demonstrate  rigorously  that  the  SIFT  computer  meets  its  reliability  requirements.  We  explain  the  hierarchy  of  design  specifications 
from  very  abstract  descriptions  of  system  function  down  to  the  actual  implementation.  The  most  abstract  design  specifications  can  be 
used  to  verify  that  the  system  functions  correctly  and  with  the  desired  reliability,  almost  all  details  of  the  realisation  having  been 
abstracted  out.  A  succession  of  lower-level  models  refine  these  specifications  to  die  level  of  the  actual  implementation,  and  can  be  used 
to  demonstrate  that  the  implementation  has  indeed  the  properties  claimed  of  die  abstract  design  specifications. 

The  SIFT  (Software  Implemented  Fault  Tolerance)  computer  is  an  aircraft  flight  control  computer  developed  by  SKI  for  the 
NASA  ACF.K  program,  under  the  direction  of  IV  Dove  and  N.  Murray  of  die  Flight  F.lcctronics  Division  of  NASA  I  jinglcy  Research 
Center.  A  SIFT  system,  designed  to  meet  the  required  ultra  high  reliability  by  processor  replication  and  voting,  has  been  constricted 
by  Bcndix  Corporation  and  is  now  operating  at  SKI.  It  will  shortly  be  devivered  to  NASA  l-anglcy  for  evaluation  in  the  Airl^b. 
RaUicr  than  providing  a  general  introduction  to  the  SIFT  system  and  the  algorithms  used  lo  achieve  the  desired  fault  tolerance,  we 
explore  the  process  of  refining  die  high  level  specifications  of  SIFT  down  to  the  implementation  level.  A  general  introduction  to  SlIT 
can  be  found  in  (5, 2|  and  a  description  of  die  SIFT  executive  appears  in  [4],  The  SIFT  hardware  is  documented  in  [I],  The  fault- 
tolerance  algorithms  employed  are  defined  in  (2, 3]. 

Sections  1  and  2  of  the  piper  present  a  btief  introduction  to  die  requirements  of  SIFT  and  the  mechanisms  employed  to  cope 
with  the  reliability  requirements.  Section  3  discusses  how  formal  proof  is  used  to  substantiate  the  reliability  claims,  auction  4  outlines 
the  specification  hierarchy.  Sections  5  through  8  describe  each  of  the  functional  models  in  detail,  lhc  probabilistic  analysis  of  system 
reliability  is  discussed  in  Section  9.  Finally,  Section  10  gives  the  current  status  of  the  project. 


1.  The  Requirements  for  SIFT 

flic  SIFT  computer  system  has  been  designed  to  meet  the  requirements  for  future  passenger  aircraft  control.  Such  aircraft  must 
be  designed  lo  use  significantly  less  fuel  dian  current  aircraft.  Many  design  innovations  arc  expected  lo  assist  in  achieving  the  desired 
fuc’  economy,  innovations  in  materials,  structures,  aerodynamics,  engines,  anti  almost  every  other  aspect  or  aircraft  design.  Several  of 
these  innovations  will  require  computet  control  of  die  flight  of  die  aircraft,  particularly  to  maintain  die  .stability  of  the  airciaft  and  to 
reduce  the  stresses  in  the  structures  of  die  aircraft.  Iliis  computer  control  will  be  essential  at  ai!  times  tc  ensure  the  safety  of  flight 
Existing  aircraft  use  computers  for  various  purposes,  but  never  lo  perform  flight  safely  critical  functions,  and  thus  do  not  have  to  meet 
die  very  demanding  reliability  requirements  that  apply  to  safely  critical  components  of  die  aircraft. 

Hie  reliability  rcqtiiiemcnt  for  a  safety  critical  flight  control  computer,  as  proposed  by  FAA  and  NASA,  allows  a  probability  of 
life  threatening  failure  no  greater  than  10"v  during  a  it)  hour  flight.  'Iliis  is  equivalent  lo  a  mean  time  between  failures  of  about  one 
million  years  of  operation.  The  requirement  allows  higher  rates  for  less  critical  failures,  but  the  difficulty  of  assessing  all  the 
consequences  of  failures  in  computer  systems  has  lead  us  to  regard  any  deviation  from  the  "conc-ct"  output  as  a  failure  of  the  system. 
The  SIFT  compute!  system  has  been  designed  mu  only  lo  meet  this  reliability  requirement  but  also  to  make  it  possible  lo  demonstrate 
that  this  extreme  requirement  is  indeed  met. 


2.  The  Role  of  Formal  Proof 

flic  extreme  reliability  requirement  on  SIFT  imposes  a  very  severe  problem  in  substantiating  the  achievement  of  dial  level  of 
reliability.  At  die  required  reliability  .  no,  mere  observation,  even  of  a  large  number  of  systems,  will  be  ineffective.  Further,  a  RIFT 
svsiem  must  be  able  to  recovc'  successfully  from  severe!  million  faults  for  every  allowable  system  failure,  and  must  therefore  be  able  to 
recover  from  quite  improbable  and  unforeseen  faults  and  even  combinations  of  faults.  Thus  validation  by  fault  injection,  while 
necessary,  is  unlikely  lo  convince  us  that  SIFT  meets  its  reliability  requirements. 

The  justification  dial  SIFr  meets  the  reliability  requirement  must  be  based  on  an  extrapolation  from  fault  rales  that  are  easier  to 
measure,  such  as  diose  for  an  individual  processor.  For  SIFT,  this  extrapolation  takes  the  form  of  a  discrete  Markov  analysis,  widi  die 
numbers  of  working  and  faulty  processors  defining  the  states  and  die  fault  and  reconfiguration  rates  defining  the  transitions.  The 
validity  of  this  extrapolation  depends  on  a  number  of  assumptions,  and  at  die  desired  level  of  reliability,  even  'minor'  violations  of  the 
assumptions  can  have  significant  efi'ecls  on  die  reliability  achieved.  Thus  the  assumptions  must  themselves  be  quite  rigorously 
substantiated  if  the  claimed  reliability  is  to  be  believed.  For  instance,  one  important  assumption  of  the  Markov  analysis  is  dial  the 
occurrence  of  faults  is  well  described  by  a  Poisson  model  with  complete  independence  between  processors.  Much  of  the  electronic  and 
mechanical  design  of  SIFT  is  intended  tc.  maintain  this  independence. 

The  validity  of  the  Markov  analysis  depends  also  on  the  assumption  that  the  states  and  the  transitions  of  the  Markov  model 
correspond  accurately  to  the  actual  system,  and  that  the  states  in  which  system  failuic  is  possible  arc  coircctly  identified.  But  this 
correspondence  is  far  from  obvious,  for  the  actual  system  has  very  many  stales  with  many  complex  transitions  between  them,  and  the. 
correspondence  must  be  maintained  even  when  one  or  more  of  the  processors  has  suffered  a  fault.  In  SIFT,  this  correspondence  is 


The  research  reported  herein  was  supported  by  the  NASA  lomgtey  Research  Center  under  Contract  NAS1-15428. 


22-2 


based  on  a  predicate  system  safe  indicating  that  lire  replication  of  each  of  lire  tasks  is  sufficient  so  that  the  voting  can  mask  lire  effects  of 
Die  faults  present  in  die  system.  The  validation  of  SIFT  now  consists  of  two  parts.  The  first  of  these  is  a  demonstration  that,  so  long  as 
system  safe  is  true,  die  system  pci  forms  the  desired  flight  control  function,  even  though  one  or  more  processors  may  be  faulty.  This  is  a 
correctness  property  for  the  function  performed  by  the  system.  The  second  is  a  demonstration  that  the  Markov  analysis  computes  an 
upper  bound  on  Die  probability  dial  system  safe  becomes  false.  This  is  a  correctness  property  for  the  probabilistic  reliability  model  of 
die  system,  because  even  a  very  small  defect  in  the  demonstrations  could  allow  failures  at  an  unacceptable  rate,  these  demonstrations 
must  be  performed  with  the  rigor  of  madicmatical  proof, 

The  necessity  for  formal  mathematical  proof  to  ensure  that  SIR'  meets  the  desired  functional  and  reliability  requirements 
presents  two  major  issues: 

•  How  does  one  define  the  criteria  sufficient  to  ensure  die  correct  functioning  of  the  system? 

•  How  docs  one  prove  that  the  criteria  are  satisfied  by  the  actual  system? 

The  first  issue  is  crucial  if  the  formal  verification  effort  is  to  have  any  practical  significance.  One  must  have  confidence,  even  as  a  non- 
computer  scientist,  diat  die  formal  specifications  stating  what  is  meant  by  the  correct  functioning  of  the  system  in  fact  reflect  t)ic 
intended  behavior.  Ihat  a  formal  specification  expresses  what  the  system  designer  intuitively  means  is  detennined  by  inspection.  A 
formal  specification  must  therefore  be  believable  if  rigorous  mathematical  correspondence  to  the  specification  is  to  ensure  the  desired 
effect.  The  larger  and  more  complex  the  system,  die  more  acute  the  problem  becomes.  Specifications  reflecting  the  detailed  behavior 
of  the  system  allow  the  most  straightforward  fomtal  verification  effort  but  it  is  difficult  to  ensure  that  low-level  specifications  embody 
what  is  meant  by  die  proper  functioning  of  the  system.  Very  high-level  specifications,  abstracting  from  die  details  of  die  system,  are 
necessary  if  we  arc  to  state  the  overall  functional  and  fault-tolerance  properties  of  the  system  in  a  way  that  can  be  understood  and 
believed.  'Hie  problem  then  becomes  one  of  reconciling  the  very  high-level  specifications  with  die  detailed  transformations  performed 
by  the  programs  of  the  actual  system. 

In  order  to  state  high-level  system  specifications  that  can  be  shown  to  be  consistent  with  die  actual  program,  one  must  formulate 
not  just  a  single  specification  of  die  system,  but  a  hierarchy  of  specifications.  Our  approach  is  to  state  a  tiered  set  of  models  of  the 
system,  as  illustrated  in  die  following  picture.  1-lach  model  l..  in  the  hierarchy  specifics  an  abstract  view  of  die  system,  defining  the 
properties  of  the  system  in  terms  of  primitive  predicates  R  and  functions  F. employed  at  mat  level  of  abstraction.  At  each  level  in  die 
hierarchy,  a  model  L  can  be  seen  as  a  refinement  of  the  previous  level  l.i(.  Correspondence  between  successive  model  levels  is  done  by 
expressing  each  primitive  function  and  predicate  of  higher-level  I..  in  terms  of  the  functions  and  predicates  of  the  lower-level  L. ... 
With  this  mapping,  one  must  Uicn  prove  that  each  property  derivable  from  the  higher-level  modci  can  be  proved  from  die  lower-level 
model.  By  demonstrating  this  for  all  successive  levels  1-  and ,,  one  can  conclude  by  induction  that  any  property  provable  from  the 
highest-level  model  is  also  provable  from  the  lowest-level  model  Thus,  the  lowest-level  model  is  consistent  (or  correct)  with  respect  to 
the  highest-level  model,  ensuring  dial  analysis  of  die  system  based  on  a  higher  level  model  in  the  hierarchy  is  valid  and  could  have  been 
performed  on  die  lowest-level  model  of  the  system. 

Within  the  nicraichy,  the  lowest  level  model  of  the  system  is  the  actual  program  executed  by  the  hardware,  while  the  highest  level 
model  is  chosen  to  allow  the  required  properties  of  die  system  to  be  succinctly  slated  and  analy/.cd.  At  different  levels,  models  of  the 
system  are  specified  in  different  manners.  At  the  most  abstract  level,  a  model  of  the  system  is  defined  by  a  set  of  logical  axioms 

«■!  <prFt) 


«VFt> 

Lt+t  (pm,Ft*t) 


«•„  evF„) 

describing  properties  of  the  primitive  functions  and  predicates.  Such  a  model  need  not  fuily  characterize  die  functional  behavior  of  the 
primitives,  instead  specifying  only  the  relevant  properties  A  denolational,  or  functional,  model  of  the  system  can  be  used  In  provide  a 
more  concrete  model  of  the  system.  The  model  is  specified  as  a  recursive  function,  providing  an  abstract  implementation  of  die  system. 
At  this  level  of  abstraction,  the  full  functionality  of  the  system  is  specified.  A  still  lower-level  is  an  imperative  model  of  die  system. 
Programs  in  a  language  such  as  Rascal  or  Ada  arc  imperative  models  of  a  system,  defining  system  function  in  terms  of  successive 
transformations  of  a  global  state. 

Our  hierarchy  of  models  of  the  system  makes  use  of  each  of  the  three  types  of  models  just  discussed.  We  employ  axiomatic 
models  of  the  system  at  die  highest  levels  of  abstraction,  denouitional  system  models  as  intermediate  levels  of  abstraction,  and  finally 
several  levels  of  imperative  models  corresponding  to  the  levels  of  software  support  involved  in  compiling  and  running  SIFT. 

As  we  mentioned  above,  verification  of  the  hierarchy  consists  of  demonstrating  that  each  property  derivable  from  a  higher  level 
model  is  supported  by  a  lower  level  model.  Between  successive  axiomatic  models  this  is  achieved  by  showing  that,  with  die  specified 
mappings,  each  axiom  of  the  higher  level  model  is  provable  as  a  theorem  at  the  lower  level  model.  Between  an  axiomatic  model  and  a 
lower  level  denotalional  model,  one  must  snow  that  each  function  of  the  dcnotalional  model  satisfies  the  axioms  of  die  higher  level 
model.  Finally,  in  order  to  verify  the  relationship  between  a  denolational  model  and  an  imperadve  model  it  is  necessary  to  sFow  diat, 
based  on  a  dcnotalional  model  of  the  imperative  language,  the  function  performed  by  the  imperative  program  is  equivalent 
(homomorphic)  to  the  function  specified  at  the  higher  level  dcnotalional  model. 


! i .  An  Outline  of  the  Design  of  SIFT 

The  SIFT  aircraft  control  computer  system  is  designed  to  achieve  high  reliability  from  standard  computers  by  replication  of  the 
hardware  and  adaptive  voting  implemented  by  software.  The  voting  mechanism  detects  and  masks  hardware  faults.  Hardware 
tcctcd  to  be  faulty  is  reconfigured  out  of  the  system,  with  its  workload  being  transferred  to  other  processors.  Thus  several  successive 
faults  can  be  survived  if  there  is  sufficient  Umc  between  them  to  permit  die  reconfiguration. 

The  system  is  constructed  from  up  to  eight  identical  computer  units,  each  containing  a  Bendix  110X930  processor,  a  32K  main 
store,  a  broadcast  interface,  and  a  1553  interface,  as  shown  in  Figure  1.  The  BDX930  is  a  16  bit  processor  specifically  designed  for 


-.4 


22-3 


military  and  aircraft  use,  with  an  instruction  set  reminiscent  of.  hut  not  compatible  with,  Data  General  computers  and  a  speed  of  rather 
less  than  one  million  instructions  per  second  llach  110X910  p-ocessor  has  Its  own  32k  word  main  store,  which  cannot  he  accessed  by 
any  other  processor.  Ihc  1553  interface  provides  a  serial  bus  connecting  the  processor  to  the  various  aircraft  sensors  and  actuators. 
The  mean  time  between  failures  ot  one  of  these  units,  eontainine  pton-rsor,  fc.iC,  and  interfaces,  is  something  less  than  one  thousand 
hours. 

The  processors  communicate  with  e.vh  other  through  the  broa  cast  im,.  whirl.  c'Uial  ,  uw  drivers  and  receivers  for  the  star 
connected  broadcast  cables  ant.  a  -oi4  word  area  of  storage  called  the  data  file.  T"  bro.iocasl  interface  operates  autonomously  from 
the  ISDX93C  processor,  and  is  designed  so  tli.it,  if  ail  -iroccssou  broadcast  siinultancously,  the  broadcast  receivers  will  still  he  fast 
enough  to  receive  and  store  .11  die  informal-on  tno  .must.  I  he  data  file  is  divided  into  eight  regions,  one  of  which  is  used  to  hold 
information  to  be  hpt&dt  v-1  -.ale  tire  other  seven  regions  arc  for  the  ip  if  imormation  received  from  up  to  seven  other  processors. 
Thus,  if  a  faulty  process  broadcasts  garbage,  that  garbage  wi'i  all  be  placed  in  .  specific  region  of  every  other  processor's  data  file, 
where  it  can  h.  ignored  and  where  cannot  damage  »,e:  .  mformation  being  broad  „.->l  by  other  processors. 

In  'in  i  u-  c  conceptual! , .  a  single  instance  or -  .vh  logical  task  but,  for  reliability,  tltai  ta,k  is  actually  rcplica'cd  and  executed 
on  three  , processors.  1  ae  a  shows  a  task  6,  replicated  on  three  processors  with  its  output  being  used  by  a  task  o,  of  which  only 
one  replication  .-••  win,  1  he  o"tpu'  .f  each  replication  of  task  6.  a  tuple  of  one  or  more  words,  is  placed  during  execution  in  an 
output  buffer  in  •  .at  proccsr or.  Subs'  p  cntly  incsc  results  arc  copied  from  the  output  buffer  into  the  output  region  of  the  data  file  and 
arc  broadcast  to  a:!  the  otlie,  processors.  At  each  of  the  processors,  the  various  replications  of  the  results  of  task  6  arc  received  in  the 
regions  of  the  data  fil  responding  to  input  from  the  various  processors  executing  task  b.  In  each  of  the  processors,  the  three  or  five 

versions  of  the  results  from  task  I.  are  extracted  from  the  data  file  by  voting  software  and  the  majority  result  is  placed  in  the  input 
buffer,  from  whence  it  can  be  obtained  by  any  disk  that  needs  to  use  the  results  of  task  b.  If  there  is  no  majority,  a  distinguished  value 
is  placed  in  that  buffer  and  any  special  action  is  then  the  responsibility  of  the  tasks  using  that  value.  All  results  broadcast  arc  voted  in 
every  processor,  even  though  possibly  no  task  on  that  processor  will  used  die  voted  value.  Since  voting  takes  time,  the  various  words 
that  arc  components  of  the  result  of  a  task  may  be  voted  at  dilfcrent  times. 

The  voting  software  notes  any  discrepancies  amongst  the  values  on  which  it  votes.  A  task  error  reporter,  run  periodically  on  every 
processor,  generates  a  synopsis  of  die  errors  detected  on  that  processor  and  broadcasts  the  synopsis,  as  is  shown  in  figure  3.  The  global 
executive  task,  which  is  replicated  like  other  critical  tasks,  receives  die  error  synopses  broadcast  from  the  various  processors  and  decides 
from  them  which  processors  arc  faulty.  The  global  executive  is  responsible  for  the  reconfiguration  of  the  system,  generating  the 
configuration  of  processors  to  he  used,  excluding  the  processors  deemed  faulty  and  distribute  the  execution  of  application  tasks 
appropriate  to  die  current  phase  of  the  (light  among  die  configured  processors.  In  each  processor,  the  results  from  the  various 
replications  of  the  global  executive  are  voted  and  then  used  by  the  local  exec  live  task  to  select  a  task  schedule  for  its  scheduler,  and  to 
set  up  die  sets  of  processors  executing  each  task  for  use  hy  die  voting  software.  Note  dial,  while  the  global  executive  task  is  a  replicated 
and  voted  task  common  to  the  whole  system,  the  error  reporter  and  the  local  executive  arc  tasks  specific  to  each  processor  individually 
and  their  results  cannot  be  voted,  liven  though  they  arc  run  on  every  processor,  the  results  dicy  generate  relate  to  dicir  own  processor 
alone.  Care  is  taken  in  the  design  to  ensure  that  errors  in  the  results  of  an  error  reporter  or  iocai  executive  can  damage  only  its  own 
processor. 

The  schedule  for  Sll-T  is  designed  so  that  different  combinations  of  tasks  can  be  executed  on  different  processors.  Replicated 
tasks  can  be  executed  at  different  times  on  different  processors  therefore.  Figure  4  shows  a  small  part  of  an  activity  sequence  on  three 
processors.  The  schedule  is  organized  into  equal  subframes,  which  would  typically  be  one  or  two  milliseconds  long,  and  arc  triggered 
by  interrupts  from  each  processor  s  clock  system,  the  only  interrupts  in  the  SII'T  system.  The  sequence  of  activities  to  he  performed  by 
a  processor  within  a  subframe  is  determined  by  a  schedule  table,  which  is  selected  from  several  such  tables  by  the  configuration 
broadcast  by  the  global  executive.  Within  a  subframe,  the  schedule  can  require  a  sequence  of  broadcast,  votes,  and  task  executions. 
Voting  and  task  execution  require  die  main  processor  and  thus  the  lime  required  for  them  is  constrained  by  the  length  of  the  subframe. 
The  broadcasting  mechanism  of  the  broadcast  interface  operates  autonomously  from  the  pru  cssor.  A  task  must  have  completed  its 
execution  before  its  results  can  be  broadcast,  often  in  live  next  subframe,  and  voting  of  those  results  can  begin  in  the  subframe  after  that 
in  which  :  last  of  the  three  or  five  replications  are  broadcast,  A  task  execution  can  use  results  voted  earlier  in  the  same  subframe,  but 
the  voting  of  results  broadcast  earlier  in  the  same  subframe  is  prohibited.  This  design  decision  was  made  to  avoid  a  complex 
asynchronous  proof  that  the  result  will  have  been  received  before  it  is  voted.  The  overhead  associated  with  die  handling  of  the  clock 
interrupt,  together  witli  the  control  exercised  over  the  skew  between  clocks,  is  sufficient  to  ensure  Uiat  results  broadcast  by  a  processor 
in  one  frame  can  safely  be  voted  at  any  time  during  the  next  frame  by  any  other  processor. 

Many  of  the  (light  control  tasks  require  the  same  iteration  rate,  typically  10  to  20  iterations  per  second,  but  other  tasks  can  be  run 
less  frequently.  This  interval,  within  which  these  important  flight  control  functions  run.  is  known  as  the  system  frame.  Other  tasks  such 
as  the  global  and  local  executives  are  also  run  within  the  same  frame.  Slower  tasks,  such  as  navigation  tasks,  arc  constrained  to  run  in 
frames  that  arc  simple  integral  multiples  of  each  other  and  of  the  system  frame.  An  execution  window  for  each  task  is  the  interval  of 
time  within  which  the  task  must  be  executed  and  its  resu'ts  voted.  The  stability  of  the  control  laws  mechanized  by  the  flight  control 
programs  depends  on  avoiding  long  transport  delays  between  the  reading  of  sensor  values  and  the  commanding  of  actuator  positions. 
For  faster  flight  control  tasks,  the  execution  window  may  be  only  a  few  subframes;  slower  uisks  arc  less  demanding  and  the  execution 
window  for  them  is  the  whole  of  their  longer  frame. 

The  validity  of  the  majority  voting  approach  depends  on  alt  task  replications  on  working  processors  generating  identical  results, 
which  in  turn  depends  on  these  replications  performing  identical  calculations  on  identical  inputs.  Where  a  task  obtains  inputs  from 
other  tasks  run  at  different  iteration  rates,  the  design  must  ensure  that  all  replications  of  the  task  obtain  their  inputs  from  the  same 
iterations  of  the  other  tasks.  In  SIFT,  this  is  ensured  by  a  system  involving  auxiliary  input  buffers  to  preserve  input  values  for  use  by 
slower  tasks,  and  odd/even  double  buffering  to  ensure  that  a  task’s  inputs  remain  unchanged  throughout  a  frame  even  though  the  next 
set  of  input  values  arc  generated  and  voted  at  some  lime  during  that  frame.  Provided  drat  the  system  remains  safe,  majority  voting  of 
the  results  of  replicated  tasks  suffices  to  ensure  that  all  working  processors  obtain  tire  same  values  for  the  results  of  those  tasks.  Where 
an  input  is  obtained  from  an  unreplicatcd  source,  no  such  assurance  applies.  Not  only  may  the  result  obtained  from  -n  uurcplicatcd 
source  be  erroneous,  which  the  tasks  using  that  value  may  be  able  to  accommodate,  but  the  faulty  source  might  broadcast  different 
values  to  different  processors,  thus  causing  replicated  tasks  on  those  processors  to  obtain  different  results,  destroying  the  utility  of  the 
majority  voting.  In  Sll-T,  a  mechanism  called  interactive  consistency  |3j  ts  used  to  ensure  that  all  working  piocessors  obtain  the  same 
value  for  any  input  derived  from  an  unreplicatcd  source,  whether  that  be  an  unreplicatcd  application  task,  a  sensor,  or  an  error 
reporting  task. 


22-4 


4.  An  Outline  of  the  Model  Hierarchy 

Figure  S  shows  an  outline  of  the  various  models  and  analyses  that  arc  used  in  the  justification  nr  the  reliability  of  SIFT.  Itcfore 
the  individual  models  arc  described  in  detail,  we  give  a  description  of  their  intent  and  interaction.  On  the  right  of  the  figure  is  a 
hierarchy  of  models  of  the  correct  functional  behavior  of  SIFT,  while  on  the  left  are  a  set  of  analyses  that  yield  the  probability  of  that 
comet  behavior.  The  models  at  the  bottom  of  tire  figure  describe  the  hardware  of  SIFT,  upon  which  the  more  abstract  analysis  is 
based. 

The  models  on  the  right  of  Figure  S  describe  the  intended  functional  behavior  of  the  SIFT  system.  They  form  a  hicurchy  of 
models,  will)  the  actual  binary  representation  of  tire  running  SIFT  ,  rograms.  the  IIOX Wo  Frograin  as  the  base  of  the  hierarchy.  Kach 
of  the  models  above  that  is  an  abstraction  of  that  model,  omitting  some  of  the  detail  present  in  those  actual  programs  and  thus  easier  *o 
describe  and  understand.  Ihc  highest  model  of  the  system,  the  I/O  Model,  describes  the  functional  behavior  ilrat  we  desire  from  the 
system  when  it  is  working  correctly,  and  is  thus  the  model  of  the  system  that  we  must  demonstrate  is  indeed  consistent  wirit  the 
BDX9S0  Program.  The  intermediate  models  of  the  hiett  rchy  arc  necessary  so  that  the  relationships  between  models  arc  simpler  and 
easier  to  substantiate  formally. 

The  I/O  Model  specifics  SIFT'  as  a  system  that  mechanir.es  a  transfer  function  (defined  by  the  application  programs)  between  the 
sensor  inputs  and  the  actuator  outputs  of  the  system,  provided  that  an  uninterpreted  predicate  system  safe,  and  its  components  task 
safe,  remain  true.  The  model  defines  die  input/output  function  performed  by  each  application  task,  specifying  that  inputs  be  read  and 
the  outputs  generated  at  the  right  times.  Ihc  model  contains  no  description  of  processors,  replication  of  tasks,  or  voting.  The 
mechanisms  for  obtaining  reliability  have  beer  completely  abstracted  out  of  the  description,  and  all  that  remains  is  the  intent  -  a 
reliable  system  to  fly  the  aircraft 

The  nest  model,  the  Replication  Model,  augments  the  I/O  Model  with  the  concepts  of  processors,  replication,  and  voting,  and 
also  the  knowledge  of  which  processors  arc  working,  ’this  allows  us  to  derive  the  predicates  system  safe  and  task  safe  from  the  poll  sets 
of  processors  whose  results  are  to  be  voted  for  each  task.  However  this  model  contains  no  concept  of  resource  allocation  or  scheduling. 

The  Broadcast  Model  increases  the  detail  to  include  the  concepts  f  resource  usa«  and  the  allocation  of  resources  through  a 
schedule.  It  must  therefore  include  a  much  finer  grain  representation  of  time,  and  indeed  describe  the  slight  time  skews  Itctwccn 
processors  and  the  clock  synchronization  constraints  that  must  be  met  by  the  implementation.  This  s  the  only  model  that  describes 
asynchrony  between  processors;  the  more  abstract  models  use  the  same  time  for  all  processors,  while  the  more  detailed  models  describe 
single  processors  in  isolation. 

The  Denotational  Model  is  die  first  complete  model  of  the  system,  described  as  a  set  of  recursive  functions.  It  could  in  principle 
be  executed  by  an  aporoprialc  machine.  Its  purpose  is  to  provide  a  complete  specification  of  the  behavior  of  the  various  programs  in 
die  S1'-  ystem  against  which  the  validity  of  die  actual  implementation  can  be  demonstrated.  The  various  programs  dial  form  the 
S<FT  ulive  arc  written  in  Pascal  and  form  the  Pascal  Implementation,  from  which  is  derived  by  compilation  the  BOX930 

lu  .yiemeniaiion.  This  is  die  lowest  level  specification  of  the  SIFT  software.  In  section  10  we  discuss  the  hardware  specification  levels. 

The  functional  behavior  described  by  the  I/O  Model  is  assured  only  so  long  as  the  predicate  system  safe  remains  true.  The 
analyses  shown  on  the  left  of  Figure  5  provide  the  probability  that  system  safe  will  remain  true  and  hence  that  the  desired  functional 
behavior  will  -ontinuc.  'Ihc  BDX9J0  Fault  Model  describes  the  rates  of  occurrence  of  various  kinds  of  fault  behavior,  distinguishing 
only  between  faults  that  cause  the  same  erroneous  results  to  be  seen  and  reported  by  all  other  processors,  and  those  that  cause  different 
results  to  be  seen  and  thus  cause  conflicting  error  reports  that  could  confuse  the  global  executive. 

The  F.rror  Rale  Analysis  is  used  to  determine  the  rates  at  which  faults  will  cause  errors,  the  rates  at  which  those  errors  will  be 
delected,  the  probability  that  the  error  reports  arc  clear  enough  for  the  Global  Kxccutivc  can  be  certain  of  its  diagnosis,  and  the  rates  at 
which  the  system  can  he  reconfigured  in  order  that  the  last  vestiges  of  erroneous  results  can  be  removed  from  the  system  by  the 
majority  voting. 

:iy  the  ,y  Analysis  computes  the  probability  that  system  safe  remains  true  for  the  10  hour  flight  duration,  as 

H  .  .isors  bee c  , auuy  and  arc  reconfigured  out  of  the  system.  Both  the  Frror  Rale  Analysis  and  the  Reliabilily  Analysis  ere  Markov 
models,  whose  state  space  must  be  demonrirated  to  be  an  abstraction  of  the  states  of  the  Replication  Model,  but  whose  transition  rates 
arc  determined  by  the  simpler  probabilistic  models. 


5.  Input/Outpo1!  •l-'del 

The  lnput/Oi:tj  Kiel  of  SI  FI,  the  highest  level  model  specifying  functional  behavior,  defines  the  inpul/outpul 
characteristics  of  tast  .  ,ncd  by  SIFT'.  The  model,  specified  axiomalically,  defines  the  configuration  of  system  tasks  and  expresses 
ihc  flow  of  informa.  between  tasks.  Based  on  an  abstract  notion  cf  lime,  which  may  be  interpreted  as  subframe  time,  we  refet  to 
itcrat’ons  of  a  task  taking  place  during  various  time  intervals.  The  time  interval  for  a  particular  iteration  nf  a  task  is  referred  to  as  its 
execution  window,  having  a  begining  time  and  and  endmg  time.  Fach  task  uses  as  inputs  the  values  produced  by  its  input  tasks  and 
produces  one  or  more  outputs  during  its  execution  window.  Based  on  a  high-level  predicate  specifying  whether  a  task  is  safe  during  a 
particular  iteration  of  a  task,  the  model  defines  that  a  task  which  is  safe  during  ar.  iteration  will  produce  exactly  one  output  value, 
computed  as  a  function  of  its  input  values.  Provided  that  the  entire  system  is  safe  throughout  some  interval  (i.c..  that  all  tasks  arc  safe 
for  that  interval),  we  can  prove  by  induction  that  all  tasks  will  compute  correct  functions  of  their  intended  inputs.  This  defines  ct  a  high 
level  what  it  means  for  SIFT  to  function  correctly. 

Conspicuously  absent  from  this  model  is  any  notion  that  a  task  is  replicated  and  computed  on  a  set  of  processors.  At  a  lower 
level,  we  shall  explain  that  the  value  the  I/O  model  defines  as  resulting  from  a  giver,  task  iteration  will  actually  be  the  outcome  of  a 
majority  vote  of  processors  assigned  to  compute  the  task.  The  task  safely  predicate  taken  as  primitive  in  the  I/O  model,  defining  when 
a  task  can  be  relied  upon  to  produce  correct  results,  will  be  defined  at  a  lower  level  to  be  a  function  of  the  amount  of  task  replications 
and  the  number  of  working  processors. 

Briefly,  the  model  is  organized  as  follows.  Koch  task  a  in  the  set  of  alt  executive  and  application  tasks  Tasks  computes  a 
(mathematical)  function  A  of  its  input  values.  Inputs(a)  denotes  the  set  or  tasks  providing  inputs  to  a.  Recall  dial  tasks  do  not  all  have 
the  same  iteration  rates.  For  task  6€lnputs(a),  the  most  recently  completed  iteration  of  b  prior  to  the  execution  window  of  the  iteration 
of  a.  provides  the  input  to  an  iteration  of  a.  A  derived  function  b  to  i  of  a  denotes  the  iteration  of  b  providing  input  to  the  Hh  iteration 
of  a.  During  each  iteration  i  of  a  task  a,  a(i)  denotes  the  set  of  output  values  which  may  be  produced.  In  order  to  map  task  iterations  to 
subframe  time,  the  function  i  of  a  is  used  to  denote  the  time  Interval  [r,,^]  comprising  tlic  execution  window  of  the  i-th  iteration  of  a. 
The  functions  bcg(i  of  a)  and  cnd(i  of  a)  are  used  to  denote  the  begining  and  end  of  the  execution  window,  respectively. 


22-5 


The  overall  .tincture  of  las);  configurations  within  the  I/O  model  is  illustrated  m  Figure  6  shown  below.  For  a  task  soeh 
that  the  predicate  task  a  safe  during  i  is  true,  a  will  produce  exactly  one  output  value  during  its  execution  window.  A  task  which  is  not 
safe  during  its  '.oration  may  produce  any  number  of  outputs,  because  the  configuration  of  tasks  is  different  for  different  phases  of  the 
flight,  not  all  tasks  necessarily  compote  each  iteration.  An  unimcrprcicd  predicate  »  on  during  i  determines  whether  of  i)  is  expected  to 
compute  a  function  of  its  inputs  or  to  return  a  special  X  element  as  its  value. 

Within  the  I/O  model  die  interactive  consistency  algorithm  is  defined  as  a  special  form  or  task,  l  or  such  a  task  u,  satisfying  the 
predicate  i/c(u),  its  associated  function  A  is  the  identity  function.  Recall  from  our  dis.ussion  in  Section  3  Unit  the  interactive 
consistency  algorithm  is  used  in  order  for  multiple  processor  ,  reading  imrcplicntcd  (and  possibly  unstable)  input  to  reach  agreement  on 
an  input  value.  As  we  explain  below,  a  safe  interactive  consistency  task  will  always  produce  a  single  output  value. 

based  on  these  primitive  functions  and  predicates,  llie  I/O  model  contains  seven  axioms.  enressing  constraints  on  the  schedule 
defining  when  task  iterations  are  to  take  place  and  that  safe  tasks  compute  functions  of  their  designated  inputs.  We  do  not  illustrate  die 
entire  set  of  axioms  here.  'Die  axioms  related  to  the  scheduling  of  task  iterations  arc  straightforward.  They  express  basic  requirements 
that  successive  iterations  of  a  bisk  are  properly  ordered  in  time  and  dial  the  execution  window  of  a  bisk  b  must  precede  the  execution 
window  of  a  task  a  to  which  it  provides  inpub 

The  major  axiom  defining  the  Input/Output  behavior  of  a  task  is  the  following: 

VaC  Tasks  Vi  Vv 

a  on  during  i  A  task  a  safe  during  i  A 
Vft€ Inputs (n)  i>(  b  to  i  of  a)  *  { v^) 


D 


«<0  *  (  > 

where  v,.^  s  {  v^l  6€Inputs(d)  ] .  This  axiom  defines  that  any  iteration  of  a  task  o.  such  that  (l)n  is  both  on  and  safe  and  (2)  each 
task  b  providing  input  to  the  f-th  iteration  of  a  returned  exactly  one  output  value  during  its  corresponding  'tv-ration.  will  return 
exactly  one  output  during  its  iteration,  'I he  value  produced  will  he  that  resulting  from  applying  its  designated  function  A  to  the  set  of 
values  produced  by  its  input  tasks.  Thus,  provided  a  is  safe  and  ibs  input  is  stable,  it  v>  ill  correctly  compute  an  output  value. 

In  the  ease  of  interactive  consistency  tasks,  one  additional  axiom  governs  its  inpi.l/ouiput  characteristics: 

VaCTVukj  Vi  3v  (i/c(a)  A  task  a  safe  during  i)  D  o(0*{v) 

This  defines  that  an  interactive  consistency  task  which  is  safe  during  its  iteration  will  always  produce  a  single  value  as  output,  by  the 
previous  axiom,  if  its  input  task  is  safe  and  thus  provides  a  single  output,  the  intciactivc  consistency  bisk  will  perform  its  associated 
function  (in  thij  case  the  identity  function)  on  the  input.  Kven  if  the  input  task  is  not  safe  however,  the  current  axiom  defines  that  same 
output  value  will  be  produced. 

These  arc  the  major  axioms  of  the  I/O  model.  In  the  next  section,  we  present  die  next  lower-level  model  and  show  how  the 
primitives  and  stated  axioms  of  the  I/O  model  are  supported  at  die  r.cxt  level. 


6.  Thfc  Replication  Model 

I  hc  axiomalicaily-spccificd  Replication  model,  at  the  next  lower  level,  introduces  the  notion  Unit  tasks  arc  replicated  and 
executed  by  some  number  of  processors,  based  on  a  high  level  concept  of  each  processor  communicating  its  results  to  all  other 
processors,  a  specification  of  the  majority  voting  performed  by  each  pruccssor  is  given.  Also  defined  is  the  information  flow  through 
which  error  reports  from  individual  processors  arc  provided  to  the  global  executive.  This  information  is  used  by  the  global  executive  in 
order  to  diagnose  processor  faults  and  remove  from  the  configuration  processors  deemed  to  have  solid  faults. 

The  concept  of  task  scheduling  has  been  refined  to  define  not  only  the  execution  window  for  task  execution  but  also  the  set  of 
processors  assigned  to  execute  the  task.  The  function  poll  for  i  or  a  denotes  the  set  of  processors  assigned  to  compute  the  i-th  iteration 
of  task  a.  The  I/O  model  primitive  predicate  a  on  during  i  is  derived  within  the  Replication  model  as  3/>€poll  for  i  of  a. 

With  the  concept  that  a  processor  computes  an  iteration  or  a  task  comes  the  primitive  function  a(i)  on  p  which  denotes  the  set  of 
outputs  produced  by  processor  p  for  the  i-th  iteration  of  bisk  a.  In  a  manner  left  unspecified  by  Uiis  level  model,  processor  p 
communicates  its  results  to  all  other  system  processors.  Ihc  primitive  function  a(i)  on  p  in  q  denotes  the  set  of  values  that  processor  p 
has  reported  to  processor  q  for  the  f-th  iteration  of  a.  A  derived  function  a(i)  in  q  is  used  to  define  the  result  of  processor  q  voting  on 
the  output  of  tlic  Hli  iteration  of  a  based  on  the  results  communicated  to  it.  As  we  shall  show  shortly,  die  I/O  primitive  o(i)  for  a  safe 
bisk  iteration  will  be  derived  as  the  value  a  majority  of  assigned  processors  obtained  by  their  voting.  All  processors  are  required  to 
report  the  results  of  each  task  computation  to  all  processors,  and  all  processors  arc  required  to  vote  on  all  received  values. 

One  other  newly  introduced  derived  function  appears  in  the  Replication  model,  the  DWIndow  for  b  to  /  of  a  is  defined  to  be  the 
Jala  window,  consisting  of  die  time  interval  starting  at  bcg((f>  to  f  of  a)  of  b)  and  ending  at  cnd(>  of  a),  based  on  this  function,  we  define 
OWindow  for  i  of  a  to  be  die  interval  extending  from  die  begining  of  the  execution  window  of  die  earliest  inp  t  task  to  a  and  extending 
to  the  end  of  the  execution  of  i  of  a. 

The  overall  structure  of  the  Replication  model  is  illustrated  Figure  7.  The  task  structure  shown  is  a  refinement  of  the  task 
configuration  illustrated  in  Figure  6. 

With  the  concept  of  processor  computation  occuring  in  the  Replication  model,  the  task  safe  predicate  appearing  as  primitive 
within  the  I/O  model  can  now  be  derived  within  the  Replication  model  in  terms  of  working  processors.  The  Replication  model 
includes  an  uninterpreted  variable  S  which  denotes  the  set  of  properly  functioning  processors  at  any  given  time.  S*'l  'y' denotes  the  set 
of  processors  properly  functioning  during  the  interval  'lhis  variable  must  rcmahi  unimcrprcicd  in  all  lower  level  models  as  well, 
since  the  implementation  will  never  have  perfect  information  concerning  the  set  of  correctly  functioning  processors.  Using  this  concept 


22-6 


of  (he  set  of  working  processors,  we  can  now  derive  tire  'ask  safe  predicate  of  the  I/O  model  as  follows, 
task  a  safe  during  i  m 
if  .otcd(a): 

(2  X  | poll  for  r  of  a  D  Aflww”fe'  '  *  *| )  >  | poll  for  i  of  <i| 

V  a  on  during  f 
lfl/c(a): 


In  the  above  definition,  |s|  denotes  the  cardinality  (i.c.,  number  of  elements)  of  the  set  s.  The  definition  states  that  a  task  is  safe  either  If 
a  majority  of  the  processors  assigned  to  compute  the  task  arc  working  for  the  data  window  of  die  task  or  it  the  task  is  not  on  during  i.  It 
is  necessary  that  die  processors  correctly  (Unction  for  the  entire  data  window  of  the  task  in  order  that  we  can  be  assured  that  the 
processor  will  not  corrupt  its  input  data  prior  to  its  use.  We  omit  discussion  of  the  conditions  necessary  to  define  the  safety  of 
interactive  consistency  (asks. 

Based  on  these  concepts,  we  can  now  define  derived  oinctions  of/)  in  q  and  of').  Definition  of  die  latter  function  will  provide  the 
mapping  up  to  its  use  as  a  primitive  function  in  die  I/O  model,  ofi)  in  q  :s  defined  by: 

a(f)  In  q  ■ 

If  ?€.s,1'Vi"do*  tot  l  d  a  (jug  a  ufc  ,|urjn|  j 

then  mttjf  bagf  of/)  on  p  in  q  :  p€poll for  i  of  a)  ) 

In  the  definition,  bag  is  a  function  cicating  a  bag1  with  the  specified  elements,  and  maj  is  a  function  returning  «  (singleton)  set 
containing  die  majority  value  of  its  singleton  set  arguments.  This  definition  defines  that  'he  value  a  working  prorcssor  q  obtains  for  a 
safe  task  a  will  be  the  value  reported  to  q  by  a  majority  of  the  processors  assigned  to  compute  the  .-th  iteration  of  a.  From  the 
definition  of  task  safe,  one  can  see  that  task  safety  implies  a  majority  of  working  processors  assigned  to  compute  die  task.  Because  out 
voting  axiom  (given  shortly)  will  ensure  that  a  working  processor  will  produc"  only  a  single  output  during  its  execution  window,  we  can 
be  assured  that  the  majority  of  the  singleton  sets  reported  for  a  suic  task  will  indeed  t,'  die  majority  computed  by  all  assigned 
processors. 

We  now  define  the  derivation  of  a(i)  as: 

a(0  = 

if  task  a  safe  during  i 

then  a(i)  on  p  :  (poll  for  i  c'  a  fl  tjfmhiom  to*  t  at 

The  value  of  die  fth  iteration  of  a  safe  task  a  used  in  the  I/O  model  is  thus  the  singleton  set  that  any  working  processor  assigned  to 
compute  i  of  a  obtains  through  voting.  We  arc  guaranteed  (and  can  prove  n?  a  tb.orcni)  dial  all  such  proccssots  will  obtain  the  same 
result. 

With  these  functions  and  predicates,  die  axioms  of  the  Replication  model  can  be  stated.  The  model  consists  of  ten  axioms.  By 
usirg  the  derived  functions  and  predicates,  many  of  the  axioms  appear  identical  tc  those  of  the  I/O  model.  The  difference  is  of  course 
that  each  axiom  is  expressed  in  terms  of  Replication  model  primitives  radicr  than  I/O  model  primitives.  Our  main  execution  axiom  in 
the  Replication  model  is: 

Vo€7oj*i  V'  Vp€( poll  for  i  of  a  H  I  of  yv 

a  on  during  i  A  task  a  safe  during  1  A 
VbClnputs(a)  b(b  to  f  of  a,  in  p  *  {v^) 

D  a(i)  on  P-{d(v|BHU(a))} 

where  »|  (l(o>  is  defined  as  in  the  Replication  model.  This  axiom,  quite  simib'  lu  its  counterpart  in  the  I/O  model,  defines  that  a 
working  processor  p  will  compute  the  proper  function  of  its  input  values  for  a  task  a,  provided  a  is  sufe.  Ihc  interactive  consistency 
axiom  is  exactly  as  given  in  the  I/O  model. 

Also  included  in  the  Replication  model  are  axioms  defining  die  error  reports  that  each  processor  must  file  when  discrepancies  aie 
discovered  during  voting.  Kach  processor  contains  a  special  error  reporting  trsk  err .  Any  working  processor  p  which  detects  that  the 
value  a  processor  q  reported  for  the  r-th  iteration  of  a  task  a  is  required  to  submit  an  Irror  report  via  the  processor's  error  reporting  task. 
Only  under  these  circumstances  can  a  working  processor  report  discrepancies.  The  error  reporting  tasks  in  turn  provide  llic  reports  as 
input  to  the  global  executive.  In  the  Fault  Diagnosis  model  we  specify  the  algoridim  employed  by  the  globe  I  executive  in  its 
dctcrmini'ion  of  who  is  at  fault  and  whether  a  solid  or  transient  fault  has  occurred. 

U  ir.g  the  dc.ivations  for  die  primidves  of  the  I/O  model  that  we  have  given,  one  must  show  that  each  axiom  of  the  I/O  model  is 
provable  as  a  theorem  within  die  Replication  model. 

1 A  baa  b  *  “let"  vhkh  cm  contain  multiple  copia  of  the  mine  dement. 


22-7 


7.  The  Broadcast  Model 

The  axiomatic, illy  specified  lln>, ideas!  model  occurs  at  the  next  lower  level  in  the  specification  hierarchy.  At  this  level,  a  more 
explicit  model  of  the  actual  scheduling  of  broadcast,  voting,  and  task  execution  activities  is  introduced.  While  the  Replication  model 
defined  the  effect  of  the  communication  between  processor  tmj  of  task  execution,  It  did  not  define  the  means  by  which  this  is 
achieved.  The  current  model  defines  the  sequence  of  activities,  derived  from  the  schedule  table,  which  is  to  support  the  specified 
cflixt.  Ihc  of/)  on  p  In  q  primitive  Ametion  within  the  replication  model  is  refined  to  define  the  broadcast  mechanism  responsible  for 
communicating  the  value  of  ofi)  on  /<  to  each  of  the  other  processors.  Itased  on  the  specification  of  the  activity  schedule  present  for 
each  processor,  specific  requirements  on  the  scheduling  of  the  information  flow  through  the  system  arc  formulated.  Among  these 
requirements  are  Hi  each  iteration  of  each  task  is  scheduled  sufficient  execution  time,  (2)  the  broadcast  of  each  task  output  is 
performed  within  the  required  period  and  after  the  completion  orlhc  task  iteration.  (3)  voting  on  each  task  result  occurs  within  the 
required  framc,  after  the  broadcast  has  been  received  and  prior  to  its  use  as  input,  and  (4)  all  activities  scheduled  for  a  given  subframe 
have  sufficient  time  to  complete. 

As  we  explained  in  Section  3,  four  different  kinds  of  activities  are  involved  in  the  operation  ofSIlT.  Within  the  Broadcast  model 
these  arc  represented  as 

•  <Overhcad> 

•  <"Uroadcast",  o> 

•  C'Vote",  a.  s> 

•  OExccute",  a,  start.  fnisK> 

I  nc  "Overhead"  activity  represents  the  overhead  period  occuring  at  the  beginning  of  each  subframe,  and  the  "Broadcast"  activity 
initiates  /he  asynchronous  broadcast  of  the  output  of  task  a  to  all  processors  Recall  that  voting  on  a  task  result  may  occur  in  stages. 
Within  Uiis  model  we  cxplici'ly  represent  the  output  of  a  task  as  a  sequence  of  values  (based  on  the  number  of  machine  words 
representing  die  result).  A  particular  “Vote"  activity  votes  or,  a  subsequence  j  of  the  output  sequence  resulting  from  the  execution  of 
task  a.  Ihc  specification  of  die  "Execute"  activity  for  a  task  a  includes  an  indication  of  whether  this  is  the  start  of  the  task  iteration,  an 
intermediate  execution,  or  die  finish  of  the  iteration. 

The  primitive  fun-lion  schcd (c.l.p)  within  the  model  denotes  the  schedule  table,  specifying  die  sequence  of  activities  to  be 
executed  by  processor  />  during  subframe  time  t  when  in  configuration  c.  As  discussed  in  Section  3,  the  configuration  at  a  particular 
time  consists  of  a  mapping  from  each  task  to  the  ret  of  processors  which  have  been  assigned  to  execute  the  task.  Die  configuration  is 
calculated  once  per  frame  by  the  global  executive  and  broadcast  to  all  processors.  I  lie  function  couflfif/.u)  denotes  the  set  of  processors 
in  the  configuration  lor  task  a  at  subframe  time  l.  I  he  poll  for  i  of  k  primitive  function  in  the  Replication  model  is  mapped  up  from  the 
Broadcast  model  r,  configfbegO  of  a.  a),  i.c.,  as  the  configuration  present  at  die  beginning  of  the  execution  window  for  die  r-th  iteration 
of  task  a. 

From  the  schedule  tabic  sched  the  actual  schedule  of  activities  for  each  processor  is  determined.  During  a  subframe,  each 
processor  performs  the  sequence  of  activities  indicated  by  the  schedule  table.  Within  the  model,  an  activity  lime  i  denotes  a  finer  grain 
of  time  than  subframe  time.  It  is  used  to  order  the  cumulative  activities  performed  by  a  processor  p.  Two  auxiliary  functions  arc  used 
to  convert  between  subframe  time  and  activity  time.  The  function  sui.  frame  f  /  )  maps  an  activity  time  t  on  processor  p  to  the  subframe 
in  which  die  activity  is  performed.  In  die  other  direction,  the  function  start ^f/)  maps  a  subframe  time  to  the  activity  time  of  die  first 
activity  of  processor  p  in  subframe  I.  Based  on  activity  time,  the  derived  function  schcduM/ .p)  denotes  the  activity  performed  bv  by 
processor  p  at  ac'iviiy  time  I  .  Ill's  history  of  processor  activity  is  derived  from  the  schedule  uiblc  and  the  changes  of  configuration 
mandated  by  the  global  executive. 

Figure  3  below  illustrates  the  flow  of  information  dirough  the  system  is  a  result  of  the  Broadcast,  Vote,  and  Execute  activities. 
The  functions 


•  output  for  a  on 

•  datafile  out  for  a  on  p(/^) 


•  datafile  ,'n  q  for  a  on 


>  input  n  in  q  for  e  of  aftj 


denotes  the  values  of  each  of  these  data  structures  at  each  activity  lime. 


'Die  output  buffer  fur  task  a  on  processor  p  is  modified  during  the  Execute  activities  performed  for  the  task.  As  a  result  of  a 
Broadcast  activity  on  processor  p  for  task  a,  die  value  of  output  for  a  on  p  is  transfered  to  datafile  out  for  a  on  p  and  an  asynchronous 
broadcast  is  initiated.  Sometime  later  (as  we  discussed  in  Section  3),  the  value  broadcast  is  received  in  datafile  in  q  for  a  on  p  within 
each  processor  q.  As  the  result  of  a  Vote  activity  <"Vote",n,j>  on  processor  q.  a  particular  subsequence  s  is  extracted  from  die  result  for 
task  a  received  in  each  of  die  datafile  in  buffers  and  voting,  based  on  die  designated  poll  set  is  performed.  I  hc  result  of  the  vote  is 
placed  in  a  set  of  input  buffers  input  n  in  q  for  c  of  a  at  f  for  each  element  r  contained  in  the  sequence  s.  As  we  explained  in  Section  3. 
because  of  disparate  iteration  rates  of  the  tasks  it  is  necessary  to  double  buffer  the  results  of  die  voting  and  to  maintain  separate  copies 
of  die  results  for  each  task  iteration  frequency  depending  on  the  value,  flic  n  parameter  is  the  boolean  value  selecting  the  buffer  for 
receipt  of  the  value,  while  /is  the  frequency  parameter  quantified  over  all  iteration  rates  for  which  there  is  a  task  depending  on  the 
output  of  task  a.  The  choice  of  hulfcr  in  die  double  buffering  scheme  is  accomplished  using  two  primitive  functions  nrtsclecl(/  ,a)  and 
rdselcct(^.n)  to  select  die  appropriate  buffer  for  writing  and  reading,  respectively,  the  result  for  task  a  at  activity  time  t .  ’ 

Within  the  Broadcast  model  is  the  first  indication  that  me  SIFT  system  is  not  synchronous.  Associated  with  each  processor  p  is  a 
function  real  (/  )  which  maps  activity  time  on  processor  p  to  real  lime.  As  we  discussed  in  Section  3,  because  of  clock  skew  and 
transport  delay  Whin  SilT,  the  processors  will  not  be  synchronized.  In  order  for  die  system  to  function  correctly,  it  is  necessary  that 
the  clocks  remain  within  a  specified  tolerance  of  each  other  --  to  do  so  is  the  responsibility  of  the  clock  synchronization  task  which  is 
part  of  each  processor's  Local  Executive.  As  we  discussed  earlier,  b!FT  is  carefully  designed  so  that  the  distributed  system  is  effectively 
synchronous.  Assuming  the  correctness  of  the  clock  syncronization  algorithm,  asynchronism  caused  by  processor  clock  skew  has  no 
externa!  effect.  In  the  ease  of  an  asynchronous  Broadcast  activity,  fiir  example,  our  specifications  define  the  value  at  die  destination 
only  after  the  latest  time  at  which  the  broadcast  could  have  been  completed  given  the  maximum  processor  skew.  It  is  incumbent  upon 


22-8 


us  io  prove  that  no  access  will  be  attempted  of  (lie  data  before  this  lime  in  order  to  map  this  asynchronous  system  up  to  the  higher-level 
synchronous  Replication  and  I/O  models. 

Based  on  the  Broadcast-level  primitive  functions,  the  high-level  operations  representing  information  flow  given  In  the  Replication 
model  can  he  refined  to  define  Information  flow  directly  relninbiw  to  actual  data  structures  present  in  the  Sil-T  system.  The  oft)  on  p 
primitive  in  Die  Replication  model  can  be  derived  in  terms  of  the  value  of  output  for  a  on  />  at  the  finish  of  the  Execution  activity  for  a 
on  processor  p,  'Ihc  primitive  function  Mi)  on  p  in  q  can  be  derived  in  terms  of  lire  value  of  dutultlc  in  q  for  a  on  p  at  the  times  of  the 
Vote  activities  for  a' s  result  'Ihc  function  Mi)  in  q  within  the  Replication  model  can  similarly  be  derived  as  the  values  of  the 
Input  ,i  in  q  for  e  of  a  at  / for  the  appropriate  buffer  number  n  and  for  each  element  e  of  Use  result. 

The  execution  and  data  windows  for  each  iteratin'',  of  each  task  arc  present  within  the  Broadcast  model  and  form  envelopes 
during  which  certain  activities  must  be  completed.  In  particular,  all  execution  activities  and  all  processors'  voting  on  the  results  must  be 
scheduled  within  the  execution  window,  and  ail  execution  windows  of  all  input  tasks  must  be  contained  in  the  daw  window  for  the  task. 

In  terms  of  Usese  primitives,  we  can  now  illustrate  the  axioms  defininn  *h:  Vine  and  Execute  activities,  corresponding  to  the 
definition  of  voting  and  the  main  execution  axiom  of  the  Replication  model  given  in  previous  section  We  first  give  the  axiom  defining 
the  Vote  activity. 

Vp  Va  Vi  Vif  Ve€i  V/>ratc(a) 
p€o'ty 'pi  A  schedule { ^,p)*<"Vota" .u,j> 

D 

input wrtselcctf r  ,a)  in  />  for  e  of  a  at  fit*  1)  * 
maj(bag(v^(r)  :  <fCeonfig(  shirty  tf)  .a)  A  s^*  datafile  in  p  tor  a  on  q{lf)  )) 

A  "nothing  else  changed" 

In  this  axiom,  rutc(a)  is  a  function  derived  from  the  soiled  table  and  denotes  the  iteration  rate  of  task  o,  "Nothing  else  changed"  is  used 
informally  here  to  indicate  that  no  other  data  structures  are  affected  by  the  Vote  activity.  Briefly,  the  axiom  states  that  if  a  working 
processor  p  has  been  scheduled  to  vote  on  a  subsequence  s  of  task  as  output  at  activity  time  l ,  then  the  result,  at  activity  time  I + 1  will 
be  that  the  input  buffer  selected  by  wrtsclccl  for  each  element  e  of  i  will  have  received  the  majority  value  from  the  datafile  in  buffors 
corresponding  to  each  processor  in  the  configuration. 

For  the  Execute  activity  we  have  the  following  axiom: 

Vp  Va  V start  MfiniJt 

ptsf'p  'p)  A  schcdulc(/#,p)»<"Exacuta" .a,  start,  finish) 

VbClnputs(a)  V 1  <r£  result. slrc(  ft) 
finish  A 

Input  rdsdcct(^,minratt(a,i>))  in  p  for  t  of  b  at  ra(e(mlnrutc(a, 6) ) ( lf)  m  *b  c 
D 

output  for  a  on  p[t*  1)  ■  A(  )  A  "nothing  tlsa  changed* 

where 

vUo<iis(a)  *  (  vb,t  I  ^lnputs(a)  A  1  <f£rcsult.sire( b)  > 

Despite  die  forbidding  appearance  of  the  axiom  the  behavior  specified  is  quite  simple.  A  working  processor  p  which  is  scheduled  to 
perform  an  Execute  activity  for  a  at  activitv  time  t  will  do  the  following.  If  this  is  the  finishing  Execute  activity  for  the  iteration,  and  if 
v^  f  is  the  value  in  the  selected  input  buffer  for  the  rlh  element  in  the  value  that  processor  p  lias  voted  for  die  result  of  b.  dicn  at  activity 
time  i  + 1  the  output  buffer  for  a  on  p  will  contain  the  result  of  applying  n's  characteristic  function  A  to  all  input  values  In  doing 
the  selection  of  buffer  in  the  double  buffering  scheme,  the  function  minratefu.bl  is  used  to  select  the  slower  of  die  two  tasks  for 
determination  of  the  proper  buffer. 

Recall  that  within  the  higher  level  models  the  interactive  consistency  algorithm  was  specified  as  being  performed  by  a  special 
task.  This  is  refined  within  the  Broadcast  model  to  actually  consist  of  a  number  of "  Broadcast"  (and  in  some  eases  "Vole")  activities. 

One  can  see  from  the  brief  description  of  the  predicates  and  functions  comprising  the  Broadcast  model  of  Sil-T  and  from  the 
axioms  for  voting  and  task  execution  that  the  structures  and  concepts  defined  arc  approaching  those  employed  in  the  implementation. 
Our  specifications  arc  becoming  progressively  more  detailed  and  lower-level.  In  viewing  this  level  of  specification  and  all  lower  levels 
of  specification  die  need  for  more  abstract  system  models  in  determining  system  correctness  should  be  apparent 


8.  The  Denotatlonal  and  Imperative  Models 

All  die  models  previously  presented  have  been  uxiomilicalty  specified,  stating  required  properties  of  primitives  within  the  model 
without  giving  comp'ctc  functionality.  The  Denotationai  Model  is  the  first  complete  model  of  the  system.  It  specifics  a  set  of  recursive 
functions  comprising  the  SIFT  executive  which  is  replicated  on  each  oroccssor.  It  could  in  principle  be  executed  by  an  appropriate 
machine,  albeit  with  excruciating  inefficiency.  Its  purpose  is  to  provide  a  complete  specification  of  the  behavior  of  the  various 
programs  in  the  SIF1'  system  against  which  the  validity  of  the  actual  implcmcntador.  can  be  demonstrated.  Consequently  it  b  highly 
constrained  by  the  needs  of  the  program  verification  system, 


The  various  programs  that  form  the  SIFT  executive  arc  written  in  Sequential  Pascal  anti  form  the  Pascal  Implementation ,  from 
which  is  derived  hy  compilation  tile  HOX9.1t)  Implementation,  These  represent  imperative  models  of  the  system,  the  latter  or  which 
being  the  actual  Sll-T  implementation.  Many  programs  in  the  system  can  he  proved  correct  in  (heir  Pascal  representation,  und  sue 
proofs  are  simpler  because  it  is  easier  to  understand  the  intent  of  the  Pascal  program.  The  Pascal  program  is  written  within  the  limits 
imposed  by  a  sequential  programming  language.  As  such,  real-time  communication  via  external  interrupts  and  message  passing 
through  the  data  files  cannot  be  represented  within  the  program.  Such  aspects  of  die  Sll-T  executive  ate  represented  through  changes 
to  special  variables  and  flags  not  assigned  within  the  Puscal  program.  Ilecausc  of  this,  a  few  portions  of  the  system  cannot  easily  be 
verified  in  Pascal,  for  instance  the  scheduler  and  the  clock  synchronise  '  ''ode.  Ary  verification  that  is  attempted  on  the  basis  of  the 
Pascal  code  must  he  predicated  upon  the  proof  that  the  clock  synchroui/am-  i  algorithm  allows  one  to  reason  about  certain  segments  of 
die  code  as  sequential  programs  with  no  external  interference.  Any  such  verification  of  Pascal  functions  also  depends  upon  a  proof  that 
the  translation  from  Pascal  to  III  1X930  code  is  correct.  It  is  therefore  anticipated  that  th-c  majority  of  the  formal  proof  of 
correspondence  will  be  will  be  between  the  Dcnotalional  Model  and  die  1)0X9.10  Implementation.  Where  Pascal  program  proofs  arc 
available,  the  various  lemmas  and  invariants  developed  for  that  proof  can  be  mapped  down  into  the  BDX930  Machine  Code  proofs, 
greatly  speeding  their  construction. 

The  formal  verification  of  the  1)0X9.10  Implement  ion  is  performed  by  reference  to  the  1)0X930  Specification  of  the  behaviour  of 
the  processor  and  associated  hardware.  Developed  for  this  purpose  is  a  formal  interpretive  model  for  the  BDX930  machine,  written  in 
lloycr-Moore  recursive  function  theory  and  derived  from  an  ISPS  semi-formal  machine  spccilknion.  Here  Km  design  faults  might  lurk, 
and  it  is  nccccssary  to  demonstrate  dve  consistency  of  the  hardware  with  its  specification,  litis  will  be  a  two  step  procedure.  Tirsl  it 
must  be  shown  that  Utc  HDXoJO  Microprogram  executing  on  dve  1)0X9.10  Microprogram  Processor  is  consistent  with  the  1)0X930 
Si>eciJicalion ,  and  dten  it  must  be  shown  dtat  Utc  1)0X9.10  l.ogic  Design  correctly  implements  die  l)!)X9Ju  Microprogram  Processor 
Specification. 

We  do  not  discuss  these  low  level  mode's  w  itlun  this  paper. 


9.  Reliability  Analysis 

The  purpose  of  die  various  analyses  culminating  in  the  reliability  analysis  is  to  estimate  the  probability  dial  the  Sll-T  system  will 
enter  a  state  in  which  system  safe  is  not  tme,  placing  an  upper  bound  on  the  probability  of  system  failure.  Two  primary  analyses  arc 
involved,  the  Lnor  Kate  Analysis,  which  involves  die  rate  of  detection,  reconfiguration,  and  error  masking,  and  die  Reliability  Analysis, 
which  investigates  system  safe. 

The  Reliability  Analysis  is  based  on  a  discrete  Markov  model  containing  very  many  states,  fortunately  it  is  possible  to  perform 
an  analytic  reduction  of  dtese  suites,  discarding  states  with  negligible  occupancy  and  combining  together  many  other  suvev  until  the 
number  of  suites  hecomcs  tractable.  This  reduced  Markov  model  has  states  described  by  three  coordinates,  the  number  of  processor 
remaining  in  die  configuration,  die  number  of  processors  in  the  configuration  that  have  solid  faults,  and  the  number  of  transient  faults 
that  have  occurred  and  whose  erroneous  results  haw  not  yet  been  completely  masked  by  the  majority  voting  The  reduced  Markov 
model  is  evaluated  by  successive  squaring  of  the  transition  matrix.  Figure  9  shows  the  model  in  the  plane  where  die  number  of 
transient  errors  is  zero.  Note  that  in  some  of  these  states  system  safe  can  be  false,  where  a  second  or  third  fault  has  occurred  before  the 
system  has  completed  the  rcconfiguiation  from  an  earlier  fault,  or  where  die  system  has  exhausted  its  supply  of  spare  processors. 

The  validity  of  die  Markov  analysis  depends  on  the  correspondence  of  the  suite  space  to  the  suites  of  die  actual  system,  die 
approximations  introduced  by  the  analytic  reduction  and  the  evaluation  method,  and  the  justifiability  of  die  transition  rates'  between 
states,  The  correspondence  with  the  actual  system  is  substantiated  by  demonstrating  that  the  states  of  die  Maikov  model  are  an 
abstraction  of  the  suites  of  the  Replication  Model.  Approximations  introduced  bv  the  reduction  anil  evaluation  can  readily  be  shown  to 
be  negligible,  'Ihe  justification  of  the  transition  rates,  which  are  derived  from  the  Hrror  Rale  Analysis,  is  less  easy.  A  Mai  kov  analysis 
requires  that  die  transitions  arc  independent  of  each  other  and  satisfy  a  Poisson  distribution.  However,  it  is  clear  that  dicse 
requirements  arc  not  completely  or  even  subsumtially,  satisfied  for  die  recovery  transitions.  At  present  we  depend  on  a  subjective 
assessment  that  the  actual  distributions  are  reasonably  approximated  by  the  assumed  Poisson  distributions,  and  on  *  •ensitivity  analyses 
that  demonstrate  die  effects  of  different  assumptions  for  those  distributions.  Figure  10  shows  an  example  of  die  results  from  the 
Reliability  Model. 

Tlic  Frrur  Rale  Analysis  is  a  similar  Markov  model,  whose  purpose  is  to  investigate  (1)  the  rate  at  which  fauits  cause  errors  to  be 
generated,  < 2)  the  rate  at  which  such  errors  ire  detected,  (3)  the  rate  at  which  die  Global  Executive  algorithm  can  diagnose  the  eirora 
and  reconfigure  die  system,  (4)  the  probability  that  die  error  reports  arc  simple  enough  for  the  Global  Fxeculivc  to  be  cerium  of 
making  a  correct  diagnosis,  and  (5)  the  rate  at  which  die  erroneous  information  generated  hy  die  fault  is  masked  by  the  majority  voting. 
This  last  rate  is,  of  course,  die  rate  of  importance  to  die  Reliability  Analysis,  for  it  determines  die  rate  at  which  die  system  becomes 
immune  'o  a  further  fault.  As  above,  it  is  necessary  to  demonstrate  that  the  state  space  lor  die  F.rror  Rate  Analysis  is  an  abstraction  of 
the  su-  as  of  die  Replication  Model,  and  that  the  behavior  of  the  algorithm  for  the  Global  Executive  corresponds  to  that  implemented. 

Much  of  the  interest  of  the  Error  Rate  Analysis  concerns  the  algorithms  used  by  the  Global  Executive  to  diugnosc  transient  faults 
and  faults  that  generate  conflicting  error  reports,  which  therefore  musi  be  represented  by  the  LDX930  Fault  Model.  Transient  faults, 
which  generate  errors  for  only  a  short  period  of  time,  and  which  may  be  sufficiently  frequent  to  be  a  significant  faeoir  in  the  reliability 
of  die  system,  are  easy  to  represent.  Conflicting  error  reports  can  be  generated  by  two  or  more  faults  or  by  a  single  fault  in  the 
broadcast  interface  which  causes  a  broadcast  result  to  be  seen  differently  by  other  processors.  A  single,  very  malicious  fault  of  diis  type 
could  persuade  a  naive  Global  Executive  to  discard  a  succession  of  good  processors  until  the  system  Tails,  indicating  that  accurate 
analysis  of  such  faults  is  essential.  The  great  majority  of  faults,  not  involving  the  broadcast  interface,  arc  not  differentiated  for  the 
analysis  makes  no  assumptions  about  the  behavior  of  a  failed  processor,  allowing  it  even  to  generate  entirely  correct  results,  provided 
only  that  it  remains  uncurrelalcd  with  other  faults.  Our  proof  of  correct  operation,  while  system  safe  is  true,  is  required  to  be  sound  for 
any  form  of  behavior  by  a  faulty  processor,  and  should  therefore  be  valid  for  the  b  'havior  of  actual  faulty  processors. 


10.  Current  Status  of  Project 

At  the  time  of  writing  this  draft  of  the  paper,  the  major  portion  of  system  specification  has  been  completed  and  tile  verification 
effort  has  been  initiated.  The  axiomnlically  specified  I/O  and  Replication  models  have  been  completed  und  the  Broadcast  model  is 
approximately  70%  complete.  We  arc  in  the  process  of  translating  an  earlier  SPECI  Al .  slate  machine  specification  of  die  system  into 
the  donotational  model  expressed  in  lloycr-Moore  theory.  This  is  done  to  be  consistent  with  our  goal  of  using  the  Boycr-Moorc 
theorem  prover  for  our  verification  effort  Extensive  design  changes  to  SIFT  necessitate  revision  of  the  SPECIAL  specifications  as  well. 
Ihe  Pascal  executive  is  now  operational  and  has  performed  well  during  preliminary  testing.  We  anticipate  some  amount  of  fiiture 
modification  to  bring  the  implementation  into  line  with  our  specifications.  A  Pascal  to  BDX930  compiler,  d:\clopcd  outside  SRI,  is 
being  used  to  translate  into  machine  code.  Ihe  SIFT  hardware  has  been  built  and  is  fully  operational  in  our  laboratory,  and  a  graphics- 


22-10 


oriented  (light  simulator  is  being  developed  in  order  to  simulate  complete  flight  control. 

Work  on  verification  of  the  SIFT  hierarchy  Is  proceeding  on  several  levels.  As  mentioned  earlier,  a  recursive  model  of  the 
110X930  machine  has  been  written  and  early  work  is  proceeding  on  verifying  portions  of  the  110X930  code.  Various  parts  of  the  Pascal 
code,  such  as  the  voting  algorithm,  have  been  proved  correct  already.  We  anticipate  that  verification  of  the  consistency  of  the  higher 
level  models  wMI  first  be  accomplished  by  hand  and  later  mechanized  using  the  lioycr-Moorc  theorem  prnvcr.  In  preparation  for  the 
mechanical  verification  of  the  higher  level  models,  each  higher  level  axiomaticully  specified  model  is  being  translated  into  a  lloycr- 
Moorc  denotation.!!  model,  The  result  of  tills  eflbrt  is  to  specify  an  abstract  SIFT  implementation  based  on  the  primitives  employed  In 
the  axiomatic  specification.  When  accomplished,  the  recursive  models  will  be  adopted  as  the  higher-level  models  to  avoid  a  consistency 
proof  between  the  axiomatic  and  dcnotational  models. 

Our  verification  effort  is  expected  to  stop  at  the  level  of  the  RDX930  machine  code  l.eft  to  be  verified  in  order  to  make  an 
"absolute"  claim  of  correctness  is  that  the  hardware  functions  according  to  specification  and  Unit  the  Markov  analysis  is  sound.  Also 
left  as  an  open  question  is  whether  our  figures  for  the  frequency  and  distribution  of  soiid  and  transient  faults  reflect  actuai  fault  rates 
cncnuntcd  during  aircraft  operation.  The  answer  to  the  latter  question  falls  outside  the  realm  of  formal  pioof,  of  course,  and  must  be 
decided  on  the  basis  of  empirical  study. 


Acknowledgments 

The  design,  specification  and  verification  of  the  SIFT  project  has  involved  nearly  all  die  members  of  die  Computer  Science 
latboratory,  past  and  present,  John  iVensley  lead  the  original  design  effort  for  SIKT  and  conceived  the  basic  architecture.  Also 
involved  in  the  design  were  Jack  Goldberg,  Karl  Levitt,  P.M.  Mclliar-Smtth.  Leslie  ljimport.  Kob  ShosiA.  Marshall  Pease,  Mi!:e 
Green,  Bill  Kaut/,  and  Chuck  Weinstock,  Formalization  of  die  SIFT  models  is  being  done  by  P,M.  Mclliar-Smith,  Itichard  Schwartz, 
and  Leslie  Lamport.  Lhc  design  of  the  Pascal  imp1  .-mentation  is  due  to  Chuck  Weinstock,  Karl  I  .eviti,  Dwight  Hare,  Mike  Green,  Hob 
lloycr,  and  J  Moore  are  involved  in  die  mechanical  verification  effort  Jack  Goldberg  is  the  current  project  leader,  t  he  SIFI  hardware 
was  built  by  Qendix  under  subcontract. 

The  work  reported  here  was  supported  by  the  NASA-Langley  Research  Center,  lhc  guidance  of  Nick  Murray,  our  project 
monitoi,  is  gratefully  acknowledged. 


References 

[1]  Goldberg.  J. 

Development  and  Evaluation  of  a  Software  Implemented  Fault-Tolerance  Computer:  SIFT  Hardware. 
Interim  Technical  Report  SKI  International.  Nov  1979. 

[21  Goldberg.  J. 

SIFT:  A  Provable  Fault  Tolerant  Computer  to  Aircraft  Flight  Control. 

Proceedings  of  I  TIP  Congress  SO ,  1980. 

[3]  M.  Pease,  R.  Shostak  and  l_  Lamport 

Reaching  Agreement  in  the  Presence  of  Faults. 

Journal  qf  the  ACM  27(2):228  234,  April.  1980. 

14]  Weinstock.  C. 

SIFT:  System  Design  and  Implementation. 

lOili  International  Symposium  on  Fault  Tolerant  Computing ,  October  1980. 

[5]  J.  Wensley  et  al. 

Sll-T:  Design  and  Analysis  of  a  Fault-Tolerant  Computer  for  Aircraft  Control. 

Proceedings  of  ilie  IEEE  66tl0):1240-12$4,  Ociobc,,  1978. 


Fig»*  3:  Information  Flows  for  Krror  Reporting  and  Rccon figuration 


Fhurc  4:  A  Part  of  a  Schedule  for  Three  Processors  in  SIFT 


i 


\ 

•0X930 

UNfct»«*n 

F’nufc  5:  The  Hierarchy  of  Models  and  Analyses  use  a  w  sufesunuau  the  RcliaoiVity  of  SIFT 


23-1 


Reconf  igurat  ion* 

a  method  to  improve  systems  reliability 
J.  Szlachta 

Litton  Technische  Werke 
7800  Freiburg i  W. -Germany 

ABSTRACT 


To  improve  the  reliability  of  a  flight- 
augmentation  computer*  a  system  with  hardware  and 
software  t econf igu rat  ion  capabilities  was 
developed.  The  system  consists  of  a  network  of  n 
redundant  computers*  linked  via  m  serial  buses.  A 
redundant  computer  consists  of  2  CPU’ s*  2  memories 
and  2  or  more  I/Q  drivers  .  A  fault  in  one  of  the 
components  of  the  redundant  computers  causes  a 
hardware  reconf igurat ion  which  replaces  the  faulty 
component  by  its  still  functioning  twin.  If  a 
redundant  computer  fails  altogether*  all  tasks 
allocated  to  it  are  transferred  to  one  of  the 
still  working  computers  of  the  network.  This  is 
made  passible  by  loading  dormant  copies  of  the 
tasks  into  at  least  one  other  computer  of  the  ini¬ 
tial  system.  These  dormant  copies  are  periodically 
supplied  with  the  program  status  of  the  active 
copy. 


Introduction 

The  objective  of  the 
Redundant  Computer  System  pro¬ 
ject  was  the  development  of  an 
integrated  hard-  and  software 
system  as  a  basis  for  a  highly 
reliable  f 1 ight-augmentat ion 
computer. 

The  reported  reliability 
requirements  for  such  a  com¬ 
puter  are  of  the  magnitude  10 
fai lures /hour  for  a  10  hour 
flight.  Values  of  this  magni¬ 
tude  can  only  be  achieved  by 
the  introduction  of  redundancy 
into  the  system.  There  are  two 
standard  methods  by  which  this 
can  be  done.  Firstly*  by  static 


redundancy*  which  means  by 
masking  failures  of  individual 
components  by  the  use  of  major¬ 
ity  decisions*  and  secondly  by 
dynamic  redundancy*  by  replac¬ 
ing  a  failed  component  with  a 
still  functioning  one  of  the 
same  type. 

With  few  exceptions  like 
SIFT  and  FTMP*  the  majority  of 
published  systems  are  based  on 
static  redundancy.  In  the 
Redundant  Computer  System  pro¬ 
ject  we  aimed  to  improve  relia¬ 
bility  by  dynamic  redundancy  at 
two  levels. 

The  first  level  (systems 
level)  consists  of  a  network  of 


23-2 


n  computers  loosely  coupled  via 
m  serial  buses.  Redundant 
copies  of  the  tasks  to  be  run 
are  distributed  over  these  com¬ 
puters  . 

At  the  second  level 
(hardware  level)  the  individual 
computers  and  bus  links  are 
duplicated  so  that  a  faulty 
component  can  be  replaced  by 
its  twin.  After  a  brief  over¬ 
view  of  the  hardware  basis  at 
the  systems  level*  we  will  con¬ 
centrate  our  attention  in  this 
paper  on  the  mechanics  by  which 
reconf iguration  is  performed. 


REDUNDANT  COMPUTER 


FIG  1  REDUNDANT  COMPUTER 

The  Redundant  Computer 

The  general  configuration 
of  the  individual  computers  of 
the  Redundant  Computer  System 
network  is  illustrated  in  fig-1 
.  They  consists  of  the  basic 
modules  CPU  t  memory  and  I/O 
driver.  These  basic  modules  are 
duplicated  and  linked  using 


crosscoupling  circuits.  As  the 
system  is  switched  on*  *.11 
modules  are  active  and  con¬ 
stantly  monitored  as  described 
below  .  A  reconfiguration  to  a 
component  with  a  latent  failure 
is  therefore  avoided. 

The  methods  used  for  error 
detection  in  the  different 
modules  depends  on  their 
respective  type.  For  the  two 
clock  synchronized  CPU’s  the 
output  produced  is  constantly 
compared  by  a  comparator.  If 
this  comparator  detects  a 
difference?  a  self-test  program 
is  started  by  hardware*  in 
order  to  identify  the  faulty 
CPU.  The  run  time  of  this  test- 
program  is  less  than  1  mil¬ 
lisecond.  In  a  sense  it 
replaces  the  third  CPU  of  a 
minimum  majority-system. 

The  error  checking  of  the 
memories  is  performed  with  the 
use  of  testbits  which  are 
stored  together  with  the  data- 
bits.  These  testbits  not  only 
protect  the  data  but  also  their 
addresses. 

The  error  checking  of  the 
I/O  drivers  uses  special  test 
data  and  self-test  loops. 


CROSt  COUPLING  MUtliPLEXERS 

CONlROl  SCWAl 


CCMHl'i  VGUAI 
f  >G  7 


23-3 


The  crosscoupling  circuits 
are  configured  as  multiple  mul¬ 
tiplexers  (see  fig.  2)*  each  of 
which  is  switched  by  the  use  of 
individually  generated  control 
signals.  A  fault  in  one  of  the 
crosscouplers*  even  certain 
double  faults*  does  not  inter¬ 
rupt  the  dataflow  between  the 
modules. 


KEOUUOANT  COMPUTER 


PIG  3  DATAPATH  AFTER  INITIALIZATION 

Reconfiguration  of  the  Computer 

At  the  initialisation  of 
the  system  control  signals  for 
the  crosscouplers  are  gen¬ 
erated  *  so  that  CPU  1  is  con¬ 
nected  to  memory  1*  and  memory 
1  is  connected  to  I/O  driver  1. 
A  datapath  is  thus  created 
which  passes  over  the  number  1 
modules  (see  fig.  3),  The 
number  2  modules  are  producing 
data  as  well  and  therefore  have 
to  receive  the  same  input  as 
the  number  1  modules*  but  this 
enters  only  the  monitoring  cir¬ 
cuits  and  is  blacked  from  the 


normal  dataflow  by  the 
crosscouplers . 

If*  for  example*  the  moni¬ 
toring  circuitry  detects  a 
fault  in  memory  1  * 
it  generates  control  signals 
which  interrupt  the  dataflow 
between  CPU  1  and  memory  1*  and 
memory  1  and  I/O  driver  1. 
Instead*  a  connection  is 
created  between  CPU  1  and 
memory  2  and  memory  2  and  I/O 
driver  1.  As  shown  in  fig.  4 
the  datapath  now  passes  over 
CPU  1*  memory  2*  and  I/O  driver 
1.  The  reconfigured  computer  is 
still  functioning*  and*  as  far 
as  CPU’s  and  I/O  drivers  are 
concerned*  still  redundant. 


redl'mdant  computer 


fc  4  datapath  a ntr*  reconfigl«at»on  j 

I 

The  systems  level  is  1 
informed  about  this  event  by  -j 
the  creation  of  specific  status  j 
information.  It  can  now  react j 
by  the  removal  of  critical! 
tasks  from  the  faulty  computer  ] 


£ 


23-4 


The  Redundant  Network 


Systems  Level  Software 


The  hardware  at  systems 
level  consists  of  a  network  of 
n  redundant  computers  which  are 
loosely  coupled  by  m  redundant 
serial  buses.  An  example  is 
given  in  fig.  5.  This  figure 
contains  seme  additional 

detailes  which  refer  to  the 
demonstration  model.  None  of 
the  components  of  the  network 
is  distinguished  from  any  of 
the  others. 

Within  the  limits  of  capacity! 
each  of  the  redundant  computers 
can  perform  the  same  function 
as  all  the  others. 


MOUMDANT  COMPl/)£T?- NCI  WORK 


FIGS 


Contrary  to  the  hardware 
level  where  the  monitoring  of 
the  mein  modules  is  performed 
by  special  circuitry?  error 
detection  in  the  computer  net¬ 
work  at  systems  level  is  done 
entirely  by  software.  The  main 
method  used  for  error  detection 
is  act ivity-monitor i ng *  that 
means  by  checking  whose r  each 
computer  produces  output  and 
update  messages  within  a  prede¬ 
fined  time  interval.  The  pur¬ 
pose  of  update-messages  will  be 
described  later. 


The  software  at  systems 
level  consists  of  three  typest 
the  systemskernel *  the  system- 
sprocesses*  and  the  user- 
processes.  Systemskernel  and 
systemsprocesses  together 
allow  the  implementation  of 
user  functions  independent  of 
the  system  configuration  and 
without  any  knowlege  of  the 
internal  structure  of  the  sys¬ 
tems  software.  The  actual 
assignment  of  the  individual 
user  functions  to  the  computers 
of  the  redundant  network*  at 
any  moment  in  time*  is  hidden 
from  the  user.  The  user 
processes  perform  the  systems 
data  processing  functions  .  It 
is  these  user  processes  which 
the  systems  software  attempts 
to  keep  running  in  case  the 
computer  to  which  they  where 
originally  allocated  fails. 

Systemskernel 

The  systemskernel  provides 
all  those  functions  which  are 
necessary  for  the  control  of 
and  the  communicat ion  between 
the  systems—  and  the  user- 
processes.  These  functions  are 
basically  identical  with  those 
provided  by  the  kernel  of  a 
normal  process  control  operat¬ 
ing  system  . 

In  addition  however*  the 
kernel  of  the  Redundant  Com¬ 
puter  System  contains  the  two 
functions  "send  update"  and 
"receive  update" 

with  which  a  process  can  send 
and  receive  program  status 
information. 

This  program  status  informa¬ 
tion  is  used  to  synchronize  the 
prog ramstatus  of  redundant 
copies  of  a  userprocess. 


23-5 


Systems processes 

The  systemsprocesses  per¬ 
form  the  bulk  of  the 
redundancy-administration  at 
systems  level.  Tney  exist* 
like  the  kernel  functions*  as 
identical  copies  in  all  comput¬ 
ers  of  the  network.  When  the 
system  is  switched  on*  every 
copy  of  a  systemsprocess  is 
active  in  parallel. 


PROCESS  CONFIGURATION 


S\.  COMPUT  £  R 

PROCESS 

7  3  3 

STSTEM  1 

X 

X 

X 

2 

7  X  X 

3 

XXX 

USER  l 

X 

X 

X 

5 

X  x  X 

6 

XXX 

7 

X 

X 

X 

e 

X  X 

9 

X 

FIG  6  AFTER  INITIALIZATION 


The  systemprocesses  main¬ 
tain  local  copies  of  network 
description  data  in  the  indivi¬ 
dual  computers.  This  data  must 
be  kept  consistent  throughout 
the  system*  so  that  all  deci¬ 
sions  based  on  it  ere  indepen¬ 
dent  of  the  arbitrary  computer 
on  which  they  are  performed. 
To  achive  this*  one  of  the  sys¬ 
temsprocesses*  the  configura¬ 
tion  administ rator »  circulates 
a  svnchonising  message  through 
th  .  system.  Each  local  copy  of 
the  conf igurat ion  administ rator 
is  identified  by  a  unique 


indexnumber .  Using  this  index- 
number  a  neighbourhood  rela¬ 
tionship  is  established  between 
the  copies.  The  neighbqur  of 
the  process  with  the  highest 
indexnumber  is  the  one  with  the 
lowest.  The  synchronisi ng  mes¬ 
sage  is  handed  from  one  copy  of 
the  conf iguration  administrator 
to  its  neighbour.  A  special 
algorithm  is  used  to  prevent 
duplication  of  the  synchronis¬ 
ing  message  and  recreates  it  in 
case  it  is  lost  with  a  failing 
computer . 

Besides  being  used  for  the 
synchoni aation  of  the  network 
descript  imr  these  messages  are 
part  of  the  error  detection 
algorithm  at  systems  level. 
The  receiving  copy  of  the  con¬ 
figuration  administrator  sends 
an  acknowledgement  to  its 
predecessor.  Using  this 
handsheje  protocol 1  an  activity 
test  or,  the  successor  is  per¬ 
formed.  If  this  test  fails  the 
conf iguration  administrator 
repeats  the  test  with  the  suc¬ 
cessor  of  its  origional  succes¬ 
sor  e.c.t.  The  results  of 
these  tests  are  recorded  in  the 
synchronisi ng  message.  An  aux¬ 
iliary  test-canal  is  used  to 
differentiate  between  computer 
and  communication  failiour. 

User processes 

The  userprocesses  perform 
the  systems  data  processing 
functions  proper.  Copies  of 
the  code  of  these  functions  are 
loaded  into  several  or  all  com¬ 
puters  of  the  initial  network. 
Obviously*  only  one  of  these 
redundant  copies  should*  at  any 
moment  in  time*  take  part  in 
active  dataprocessing. 


23-6 


Reconfiguration  of  User- 
processes 

Each  userprocess  for  which 
redundant  copies  exist  must 
immediately  after  its  initiali¬ 
zation  perform  a  call  to  the 
kernel  function  "receive 
update".  In  this  function*  the 
userprocess  waits  for  the 
arrival  of  update  messages.  If 
such  a  message  arrivesi  it  will 
be  accepted  and  its  data  dis¬ 
tributed.  The  process  returns 
then  to  the  "wait  for  update 
message"  state.  At  systems 
initialization*  each  copy  of 
all  the  systemsprocesses  is 
inside  the  "receive  update" 
call*  and  therefore  passive. 

It  is  the  function  of  the 
systemsprocess  process- 
administrator  to  activate  one 
and  only  one  of  all  the  copies 
of  a  userprocess.  When 
activated?  the  process- 
administ rator  scans  the  local 
process  configuration  descrip¬ 
tion  until  it  finds  a  userpro¬ 
cess  of  which  no  active  copy 
exists.  Figure  6  illustrates 
such  a  process  configuration 
description.  For  fig.  6  to 
fig. 9  the  X  indicates  that  a 
copy  of  that  particular  process 
is  loaded  into  the  respective 
computer*  and  a  dot  indicates 
that  the  copy  is  active. 

At  systems  initialization 
this  will  be  the  first  userpro¬ 
cess  it  finds.  It  then  checks 
wether  this  process  fulfills 
the  locality  criteria.  If  not* 
it  continues  scanning  the 
table*  if  yes*  it  activates  the 
local  copy  of  the  process  by 
sending  it  a  special  type  of 
update  message.  This  special 
update  message  causes  the 
receiving  process  to  leave  the 
"receive  update"  call  and 
resume  activity  behind  the 


PROCESS  CONFIGURATION 


^\COMPUTER 

PROCESS 

'  2  3 

SYSTEM  1 

*  •  X  •  X  . 

} 

X  •  X  .  X  • 

3 

x  •  x  •  x  • 

USER  4 

X  •  X  X 

5 

x  X  .  X 

t 

x  X  X  • 

7 

x  •  X  X 

t 

X  *  X 

9 

X  • 

FIG?  AFTER  USERPROCESS  ACTIVATION 


"send  update"  call  to  which  the 
last  received  update  message 
belonged.  If  no  update  message 
has  been  received  resumption 
takes  place  behind  the  "receive 
update"  call.  The  state  of  the 
process  conf igurat ion  descrip¬ 
tion  after  '/stems  initializa¬ 
tion  is  shown  in  fig.  7.  The 
acti'o  copies  supply  all  other 
copies  with  process  status 
information  by  periodically 
calling  the  kernel  function 
"send  update".  This  call  has 
to  be  parameterized  with  a 
description  of  all  the  process 
data  which  characterizes  the 
program  status  at  the  point  of 
call.  The  status  of  a  passive 
userprocess  copy*  after  receiv¬ 
ing  an  update  message* 
corresponds  exactly  to  the 
status  of  the  active  copy  when 
sending  it.  It  is  therefore 
possible  to  activate  a  passive 
copy  immediately  behind  the 
send  call  of  the  last  received 
update-message.  The  newly 
activated  copy  will  produce 


23-7 


exactly  the  same  output  as  the  The  next  run  of  the 

originally  active  one.  periodically-started  process- 

administrator  now  again  finds 
processes  of  which  no  active 
copy  exists. 

PROCESS  CONFIGURATION 


FIG  fl  AFTER  FAUET  IN  COMPUTER  7 


If  a  computer  of  the  net¬ 
work  fails*  another  of  the  sys¬ 
tems  processes*  the 

conf igurat ion-admi nist rat or * 
marks  the  column  in  the  process 
configuration  description 

belonging  to  the  failed  com¬ 
puter  as  empty.  Fig.  8  shows 
the  process  conf iguration 

description  after  computer  2 
has  failed.  Again* 

synchronising-messages  ape  used 
to  ensure  consistency  of  the 
configuration  description 

before  the  process  administra¬ 
tor  acts  upon  them. 


FIG. 9  AFTER  RECONFIGURATION 


Because  of  the  loss  of  one  or 
more  computers*  the  locality 
criteria  has  changed  for  the 
now  fully  passive  processes.  It 
has  become  true  in  exactly  one 
of  the  still  running  computers. 
As  described  above*  this  copy 
of  the  userprocess  will  now  be 
activated  by  the  local  copy  of 
the  process-administrator.  The 
result  of  this  activation  is 
illustrated  in  fig.  9.  In 
closing  it  should  be  mentioned 
that  this  reconfiguration  algo- 
ritnm  can  also  cope  with  the 
return  of  a  computer  into 
active  service. 

S.ate  of  the  Project 


a 


Currently  we  are  building 
demonstration  model  of  a 


23-8 


redundant  network  consisting  of 
one  redundant  and  one  non- 
redundant  computer  of  the  type 
LITEF  1432  linked  by  two  twin 
1553b  bus  connections.  The 
systems  software  is  nearing 
completion  and  has  been  tested 
in  a  simulated  environment  on 
a  single  computer.  Integration 
of  the  system  with  reconfigura¬ 
tion  tests  and  errorsimulat ions 
is  planned  for  the  second  half 
of  1931. 


Reseau  d'Echange  Reconflgurable  pour  Control e  de  Processus  R£part1 
Ch.  MERAUD  (SAGEM  6,  avenue  u'lena  PARIS  16eme) 

8.  MAUREL  (SAT  41,  rue  Cantagrel  PARIS  13eme 


24-1 


RESUME 


Ce  texte  expose  les  resultats  pour  la  partle  procedure  de  Vetude  d'un  systeme  d'echanges 
ultraflabla  a  debit  elev4. 

Ce  systeme  dolt  permettre  la  realisation  decentral  isee  des  echanges  entre  les  divers  equlpements 
embarques  d'un  avion  ou  d'un  outre  type  de  vehlcule,  pour  1 ' Integration  et  la  reconfiguration  de  fonctlons 
pouvant  etre  critiques. 

L'apparltlon  des  VLSI  et  des  fibres  (2)  optlques  Insenslbles  an.x  perlurpatlons  electromagnetlques 
a  conduit,  pour  attelndre  les  objectlfs  vises,  u  une  solution  decentral  luce  performante  par  1' Incorporation 
d' Intel  1  Igence  dans  un  module  de  raccordeinent  de  type  unlversel  appele  Interface  Sous-Sy.;teme  (ISS). 

Le  prlncIpe  retenu  substltue  au  mecanlsme  tradltlonnellement  programme  de  gestlon  des  echanges, 
un  mecanlsme  dyuamlque  Immediatament  adapte  aux  modifications,  et  permettant  une  grande  souplesse  de  syn¬ 
chronisation.  I',  fonctlonne  par  diffusion  de  messages  groupant  des  mots  de  16  bits  sulvant  une  partition 
jou6e  en  orchestre  de  chambre  par  1'ensemble  des  ISS  repartis  sur  l'ensemble  des  equlpements  raccordes. 

Les  ISS  se  synchronlsent  entre  eux  par  extraction  de  l'horloge  de  1 'emission  co'^s  et  se  concertent 
perlodlquement  pour  vallder  les  echanges,  conmuter  de  mode  eventuellement  ou  racouvrlr  les  pannes. 

La  gestlon  des  echanges  au  niveau  de  chaque  equipement  est  done  confiee  a  1  * ISS  qu‘11  Incorpore. 
Celul-cl  joue  sa  partle  speclflque  en  Interpretant  les  parametres  messages  que  1 ‘equipementler  a  Inscrit 
dans  une  memoire  morte.  PeHodiquement,  11s  se  donnent  rendez-vous  pour  echanger  les  codes  cycl  iques  ela- 
bores  par  chocun  d'eux  a  partir  des  Informations  observees  sur  la  ligne  pendant  le  cycle  precedent.  Cette 
phase  sert  a  la  detection,  au  diagnostic  des  pannes,  a  la  reconfiguration  et  enfln  a  la  resynchronisation 
et  a  la  reinitialisation  des  ISS  victlmes  d'une  panne  transitolre. 

Le  prlncIpe  substltue  au  mecanisme  traditlonnellement  programme  de  gestion  des  echanges,  un  meca- 
nisme  dynamlque  Imnediatement  adapte  aux  modifications ,  et  permettant  une  grande  souplesse  de  synchro¬ 
nisation. 


INTRODUCTION 


L'avenement  au  debut  des  annees  1970  des  liaisons  series  multlplexees  normalisees  (Digibus  GINA, 
bus  1553  ...)  pour  1 'achemlnement  des  communications  inter-equlpements  a  permls  la  realisation  des 
premieres  generations  de  systemes  numeriques  i r.tegres  de  conduite  d'arines  ou  de  vehlcules. 

Une  reflexion  sur  1 'evolution  des  systemes  d'armes  futurs  vers  tout  a  la  fois  plus  de  perfor¬ 
mances  de  complexite  et  de  critic ite  ;  un  bllan  de  1'experience  acquise  sur  les  liaisons  normalisees 
actuelles  aux  possibll Ites  1 1 ml  tees  en  tegard  des  besolns  futurs  ;  un  examen  en  contrepartie  de  1'accrols- 
sement  considerable  des  performances  de  la  technologie  avec  l'apparltlon  des  VLSI  et  des  fibres  optlques 
suggerent  pour  att-'lndre  les  objectlfs  vises  une  orientation  nouvelle  pour  la  realisation  de  la  fonction  de 
communication  vers  une  solution  repartie  par  l1 incorporation  d1 Intell igence  dans  un  module  universe!  (ISS) 
de  raccordement  incorpore  a  chaque  equipement.  "  ~  ‘  ~ 

Une  recherche  en  ce  sens  a  ete  effectuee  dans  le  cadre  d'un  contrat  DRET  (1)  par  la  SAGFM  en 
collaboration  avec  la  Soclete  Electronlque  Marcel  DASSAULT.  Ell e  a  abouti  en  decemore  1979  a  une  solution 
basee  sur  1' utilisation  des  composants  de  ligne  des  bus  normalises  actuels  permettant  des  echanges  a  1  ou 
?  MBd.  II  faut  insister  sur  la  souplesse  de  cette  solution  sure  et  immediatement  realisable. 

La  solution  presentee  icl  en  collaboration  avec  la  SAT  (2)  prevoit  1'utilisatlon  des  fibres 
optlques  pour  attelndre  les  vltesses  d'echanges  de  10  MBd  r.ecessaires  dans  les  futurs  systemes. 

El  1  e  porinettra  alors  : 

-  de  simplifier  la  maintenance  en  realisant  toutes  les  communications  digltales  (lentes  et  rapides)  sulvant 
le  meme  protocole  et  avec  ur,e  seule  famille  de  materiel  (reseau  de  bus  optlques  redondes  10  MBd  et  un 
seul  type  de  coupleur  tres  integre  incorp  a  chaque  equipement), 

-  do  faire  survivre  automat iquement  la  fonction  de  costniunication  aux  pannes  et  d'avoir  un  diagnostic  precis 
facilltant  la  maintenance  .  La  securite  de  fonctionnement  pourrait  depasser  lO'10  /heure  (probabillte 
d'une  panne  non  passivee),  et  la  flabillte  avant  reparation  pourrait  atteindre  compte  tenu  des  possibi- 
iites  de  reconfiguration  automatique  10'*  par  24  heures  dans  des  '-onditlons  d'envlronnement  severes, 

-  d'lncorporer  des  equlpements  en  redondance  pour  filtrer  levr:>  pannes  et  fcciliter  les  operations  de  main¬ 
tenance  en  ligne, 

-  d'effectuer  avec  une  grande  souplesse  pour  le  materiel  et  le  loglciel,  le  raccoioement  des  equipements  et 
les  modifications  de  configuration  et  d'echanges  du  systeme, 

-  d'obtenlr  une  datatlon  precise  des  variables  echangees. 


24-2 


Ce  systeme  de  communication,  vu  a  travers  1e  module  de  raccordemertt  ISS,  fournira  aux  construc- 
teurs  d'equlpements  embarques  un  moyen  d‘ Interconnexion  unlversel  facile  a  Interfacer  et  dont  1  1 ntel - 
llgence  incorporee  liberera  les  equipements  des  sujetlons  liees  aux  echanges. 

En  ce  sens,  11  oermettra  l'execution  U'echanges  prives  et  pre-deflnls  entre  boTtes  d'un  meme 
sous-systeme  afln  d'en  accrottre  1 'interchangecbilite  d'une  application  a  une  autre. 

Dans  le  meme  but,  11  substltuera  au  mecanlsme  tradltlonnel  de  gestlon  des  echanges  par  program- 
matlon,  un  mecanlsme  dynamlque  non  programne  imuediatement  adapt?  aux  modifications,  amellorant  la  datatlon 
des  variables  et  permettant  une  grande  soupl esse  de  synchronisation  entre  les  equipements  Informat Iques  du 
systeme. 


Pour  le  maTtre  d'oeuvre  et  1'equipementier,  11  sera  accompagne  d'un  moyen  de  gestlon  eificace 
pour  1  *  integration  du  systeme  ou  d'un  sous-systeme  separe  en  phase  de  developpement,  sous  la  forme  d'outlls 
1  ogle i el s  draide  a  la  conception  et  a  l'analyse  permettant  la  verification  et  1 'optimisation  des  ^changes, 
la  documentation  de  chaque  edition  du  systeme  et  la  generation-^  parametres  de  gestlon  des  messages. 


DESCRIPTION  GENERALE 

Composition  d'une  Interface  Sous-Systeme  (figure  1) 

Les  equipements  sont  interconnectes  par  une  double  liaison  serle  multlplexee  via  une  interface 
Intelligent  dlt  ISS  (Interface  Sous-Systeme). 

En  plus  des  fonctions  d' interface  classiques,  sa  fonctlon  est  : 

-  d'assurer  la  gestion  des  echanges  systemes  ou  prives,  critiques  ou  non,  sur  la  lial'on  multlplexee, 

-  de  permettre  la  synchronisation  des  taches  de  son  equipement  sur  1 'execution  des  echanges, 

-  de  reaiiser  la  detection  et  le  recouvrement  des  erreurs  de  transmission  ct  leur  recouvrement  par  recon¬ 
figuration  du  bus 

-  de  reaiiser  la  detection  et  !e  recouvrement.  de  ses  oropres  fautes,  alnsi  que  sa  mise  hors  service  et  son 
Isolement  du  bus  en  cas  de  panne  permanerite. 


Struct  ur e  materiel  du  bus 

Le  bus  est  physiquement  duplique  pour  survivre  aux  pannes.  Chaque  ligne  est  constitute  de  deux 
fibres,  l'une  moiitarite,  1 'autre  descendant*. 

Les  derivations  sont  recl^Ses  par  des  coupleu*s  passifs  transparents.  En  normal,  chaque  ISS 
repete  1' information  ircioente  prnj'  que  le  niveau  solt  meintenu  sur  toute  la  ligne.  En  cas  de  panne 
d'un  ISS,  la  transparence  du  couoleur  ass-re  la  continuice  de  la  liaison. 


Structure  materially  d'un  ISS  (figure  2) 

Un  ISS  esf  un  canal  specialise  realise  a  l'aide  d'un  cortroleur  microprogramne  duplique.  Une 
telle  structure  nermet  de  detecter  et  de  passiver  immedlatement  toute  anomalie  de  fonctionnemen*  pour 
empecher  tout  comportemeiit  cnarchlque  d'un  ISS  vis-a-vis  du  bus. 

L'iSS  execute  un  microprogram!*  canal  qui  interprete  les  instructions  de  gestion  des  seuls  mes¬ 
sages  interessant  1 'equipement.  Les  instructions  sort  stockees  dans  une  EPROM  appartenant  a  1 'equipement. 

II  est  relie  aux  llgnes  physiques  du  bus  duplique  par  un  connecteur  cptique  et  les  circuits  de 
conversion  par^llele/serie,  modulation/demodulation,  emission/reception  opto-clectronlque  qui  assurent  les 
fonctions  du  niveau  d' Interface  avec  la  ligne. 


Informations  echangees 

Deux  types  d' informations  doivent  etre  echangees  :  les  variables  periodiques  et  les  variables 
aleatoires  aux  delais  maximaux  fixes.  La  periodiclte  des  emissions  des  variables  du  premier  type  ayant  leur 
source  dans  1 ‘equipement,  du  point  de  vue  des  echanges  elles  peuvent  etre  traltees  comme  les  secondes 
pourvu  rue  les  delais  d'acheminement  specifies  soient  respectes.  C'est  ce  f:ui  est  realise  en  repartissant 
les  variables  sur  4  niveaux  de  priorite. 


Gestlon  des  redcndances 

Les  equipements  generant  des  variables  critiques  peuvent  etre  implantes  plusleurs  fois,  ce  qui 
permet  une  diffusion  multiple  de  ces  variables  pour  accroftre  leur  disponibil  ite.  A  la  reception,  1  * ISS  de 
chaque  equipement  concern?  se  charge  du  filtrage  des  pannes  et  de  1 'enregistreroent  du  seul  resultat  juste 
-ans  la  zone  de  1 'equipement  affectee  a  la  reception  de  la  variable. 

Ce  mecanlsme  favorlse  1' interchangeabil ite  en  permettant  1 ' integration  redondante  d'equlpements 
standards  dans  les  systemes  critiques  sans  leur  imposer  de  modifications  notables. 


24 -3 


Tolerance  aux  pannes  et  aux  erreurs  de  transmission 

La  structure  Interne  de  1‘ISS  assure  une  detection  parfalte  de  ses  propres  erreurs  grace  aux 
cholx  d'une  structure  dupl  1 quee/comparee. 

Les  erreurs  de  transmission  scnc  detec  tees  : 

-  par  detection  d' erreurs  de  modulation  grace  a  1'emplol  d'un  codage  autorythme, 

-  par  1'emplol  d'un  bit  de  parlte  par  inot  de  16  bits  echange, 

-  par  1 ‘util Isatlon  d'un  code  cycllque  elabore  et  compare  perlodlquement  par  l'ensemble  des  ISS. 

Lorsqu'une  erreur  est  detectee,  un  basculement  sur  la  deux1ei«  ligne  ost  effectue.  Ce  mecanlsme 
permet  de  flltrer  las  pannes  transitoires  sans  degrader  le  systeme. 

Un  ISS  detectant  une  erreur  persistante  dont  11  est  la  cause  se  ret  hors  service  avec  une  efflea- 
cite  parfaite  a  causo  de  sa  structure  duniiquee. 


Afln  d'eviter  1 'accumulation  de  pannes  cachees  sur  le  bus  de  secours,  le  role  des  deux  bus  est 
perlodlquement  Inverse  afln  d'etre  exerce  par  le  fonctionnement  normal  et  beneficler  d'une  reparation 
preventive  evitant  la  panne  double. 


DESCRIPTION  DU  FONCTjONNEMENT 


Necessity  d'une  grande  souplesse 


La  gestlon  classique  des  echanges  selon  une  trame  programmee  pour  reallser  la  perlodlclte  des 
echant illonnages  ou  la  datation  des  variables,  est  effectuoe  par  un  seul  calculateur  centralise  au  moyen 
d'un  programme  canal  ecrit  speclfiquement  qu'il  faut  modifier  a  cheque  changement  de  configuration  du 
systeme.  "  . ~ 


C<  chnique  contraignante  provlent  de  la  technique  de  programmatlon  d'orlgine  en  automate  des 

calculateurs  ques.  Ces  derniers  etaient  alors  uniques  dans  les  systemes,  ce  qul  elimlnalt  !*  pro- 

blemes  de  syn-.n,  nisation.  Util  isee  aujourd'hui  pour  synchroniser  des  systemes  muitl-calculateurs,  elle  est 
mal  adaptee  et  rend  toute  modification  laborieuse.  Aujourd'hui,  le  fonctlonnement  parallele  et  asynchrone 
des  calculateurs  du  systeme  exige,  pour  synchroniser  avec  souplesse  les  echanges  de  donnees  entre  les 
taches  multiples  et  hierarchies  qu'ils  executent,  une  elasticity  dans  1'  or  donna  ncecent  des  echanges 
qu'une  gestlon  programmee  central Isee  ne  permet  pas.  II  en  resulle  que,  pour  reallser  one  dotation  precise 
:es  retards  dans  des  echanges  rapides  (par  exemple  InfeHetire  au  quart  de  perlode  pour  des  variables 
a  20  ms),  il  faudra  : 

-  soit  multiplier  par  4  ii  frequence  d'echantillonnage, 

-  soit  extrapoler  a  1 'emission  et  synchroniser  les  taches  emettrices  et  consomnatrlces  sur  la  trame, 

-  soit  transporter  la  donnee  avec  sa  date  de  production  et  extrapoler  sur  les  lleux  de  consomnatlon. 


Les  deux  premieres  solutions  donnent  une  surcharge  de  calcul  elevee  et  une  gestlon  lourde  ;  la 
troisleme,  plus  correcte,  n'utillse  deja  plus  la  trame  perlodlque  conine  outll  de  dotation. 


Pour  ce  qui  concerne  la  necessity  actuelle  de  reallser  facllement  des  modifications  rapides  de 
configuration  du  systeme,  la  suppression  de  la  presence  obligee  d'un  calculateur  central  de  gestlon  des 
echanges  et  de  ses  programmes  est  un  accroisseinent  de  souplesse  notolre.  Elle  restitue  aux  fabrlcants  de 
sous-systemes ,  le^r  autonomie  technique  et  1' initiative  dans  leur  domalne  de  competence.  Les  responsa- 
bilites  techniques  sont  mieux  definles  favorlsant  la  tache  du  maTtre  d'oeuvre  et  la  flablllte  de  conception 
de  1 'ensembl e. 


Gestlon  decentral isee  de  1 'attribution  du  bus  aux  demandes  d'emisslon  des  taches 

La  production  des  variables  a  echanger  a  pour  source  l'ensemble  des  taches  reparties  dans  les 
divers  equipements.  II  faut  ordonnancer  la  diffusion  de  ces  variables  pour  une  consomnatlon  par  le  meme 
ensemble  de  taches  a  1' initiative  de  chacune  d' entre  el les. 

Pour  et re  echangees,  les  variables  sont  groupees  par  train  de  mots  de  16  bits  en  messaqes  a 
structure  fixe  avec  un  label  d'en-tete  et  une  prlorite  deflnle  a  l'echelle  du  systeme.  Les  messages  sont 
diffuses  par  les  ISS  de  chaque  equipement  a  tour  de  '•ole,  et  Identifies  en  reception  grke  au  label. 

Le  probleme  de  1  'ordonnancement  des  messages  sur  le  bus  est  de  meme  nature  que  celul  de  l'ordon- 
nancetnent  des  taches  sur  l'unite  centrale.  Celul-ci  est  aujourd’hui  tres  correctement  resolu  dans  les  sys¬ 
temes  temps  reel  multitaches  modernes  par  un  mecanlsme  d'actlvatlon  prlorltaire  a  partlr  d'evenements  et 
d'une  gestion  des  files  d'attentes  des  taches  pretes  sur  chaque  niveau  de  prlorite.  La  mellleure  solution 
consiste  done  a  adopter  ce  mecanlsme  pour  1 'ordonnancement  des  messages.  On  dlsposera  alors  d'une  Interface 
souple  et  facile  entre  les  moniteurs  temps  reel  de  chaque  equipement  et  la  fonctlon  d'echange.  Reste  a 
trouver  une  solution  repartle  pour  1 'execution  de  ce  mecamsme. 

On  y  parvient  en  faisant  executer  simul tanement  le  meme  algorlthme  par  tous  les  ISS  qul  dolvent 
done  fonctlonner  en  synchronisme.  A  chaque  instant,  un  ISS  est  en  eiilssion.  Les  autres  sont  alors  en  recep¬ 
tion  synchronistic  par  l'horloge  des  emissions  en  cours.  II  y  a  done  une  horloge  commune  a  la  population 
d'ISS  a  chaque  instant  qui  suffit  aux  besoins  de  synchronisation. 


24-4 


Implementation  d'une  phase  de  control c  perlodlque  des  gchanges 

L'ensemble  des  message--  susceptlbles  d'etre  diffuses  a  ete  reparti  a  1'avance  sur  les  4  niveaux 

de  priorite  du  systems.  Le  niveau  courant  reste  actlf  tant  qu '  11  exlste  dans  le  systeme  des  messages  en 

attente  d'emlsslon  a  ce  niveau,  et  qu’ 11  n'est  pas  apparu  de  messages  i  un  niveau  supSrleur. 

Pour  decider  des  changements  de  niveau  et  pour  les  autres  besoins  du  controle  des  echanges,  un 
dialogue  entre  les  ISS  est  necessalre.  Pour  limiter  les  retards  au  minimum  tout  en  conservant  un  bon 
rendement  au  bus,  une  fenetre  de  64  mots  pour  un  cycle  de  1024  mots  echanges  est  reservee  a  ces  besoins. 

En  debut  de  fenetre,  chaque  ISS  determine  parmi  4  files  de  64  bits  ou  s'afflchent  les  messages  a 

emettre  le  niveau  le  plus  Sieve  de  la  file  non  vide.  Ce  niveau  est  Insere  sur  2  bits  dars  un  mot  de 
contrSle.  Ces  mots  de  controle  sont  ensulte  diffuses  succefslvement  par  les  ISS  selon  leur  ordre  d'adresse 
physique  croissante.  A  la  fin  de  ces  emissions  et  quand  la  fenetre  de  controle  s'acheve  pour  demarrer  un 
nouveau  cycle,  les  ISS  savent  : 

-  s' 11s  doivent  poursuivre  les  echanges  sur  le  meme  niveau, 

-  s'lls  doivent  commuter  sur  un  niveau  superleur  et  qui  doit  prendre  la  parole  (el  1  e  est  prise  per  l'ISS 
d'adresse  physique  la  plus  pstlte  a  ce  niveau).  Dans  ce  cas,  chaque  ISS  sauvegarde  les  polnteurs  du 
niveau  Interroinpu  pour  un  retour  ulterleur. 

En  ccurs  de  cyde  et  en  dehors  de  la  fenetre  da  controle,  les  ISS  se  comportent  comme  un  canal 
specialise  entre  les  memolres  des  equlpements  servls  et  le  bus.  11s  n'assurent  que  des  operations  simples 
de  transfert  de  nets  a  1'aide  de  polnteurs  Incrementes  a  chaque  pas  et  la  mlse  a  jour  du  code  cycllque. 
Neanmolns,  quand  l'ISS  emetteur  n'a  plus  de  messages  au  niveau  courant,  la  commutation  vers  un  seccesseur  a 
lieu  Ioiediatement.  Ces  commutations  se  font  vers  un  autre  ISS  soit  sur  le  meme  niveau,  soit  vers  un  niveau 
Interieur  a  partir  des  valeurs  de  polnteurs  anterieurement  sauvegardees.  Elies  ne  necessltent  pratlquement 
pas  de  calcul,  les  parametres  de  conmutatlon  ayant  ete  determines  perdant  la  derniere  fenetre  de  controle. 


Critere  de  ventilation  des  messages  sur  les  nlveaux  de  priorite  (figure  4) 

Les  4  niveaux  sont  les  sulvants  : 

-  niveau  0  :  des  echanges  dit'feres  (echanges  longs,  echanges  de  surveillance  de  routine,  echanges  de  fond 

divers  ...), 

-  niveau  1  :  echanges  temps  reel  ordinaire, 

-  niveau  2  :  dchanges  temps  reel  urgents  ou  a  frequence  rapide, 

-  niveau  3  :  alarmes,  etc. 

!1  est  clair  que  la  souplesse  du  Jispositif  et  la  facilite  devolution  depend  des  marges  de 

charges. 


La  figure  4  illustre  le  princIpe  de  repartition  sur  un  exemple  limite  a  3  nlveaux  pour  plus  de 
clarte.  En  ordornee,  on  classe  les  messages  par  ordre  d'urgence  speclflee  decroissante  (courbe  de  droite). 
On  calcule  (courbe  de  gauche)  la  situation  de  pire  cas  de  delal  d'achemlnement.  Celle-ci  s’obtient  pour  les 
de  polntes  de  demande,  en  accumulant  les  temps  de  transfert  des  messages  supposes  transmis  dans  1 'ordre  de 
leur  classement  en  ordonnee.  La  figure  montre  alors  le  princIpe  d'une  ventilation  par  niveau  qui  menage  des 
marges  equil ibrees  et  maximal es. 


Comparaison  avec  une  traroe  programme  tradltlonnel le 

La  trame  des  echanges  obtenue  est  tres  voislne  de  ce  que  1'on  obtient  par  une  programnatlon  des 
echanges  dans  la  solution  traditionnel 1 e.  Les  differences  sont  les  suivantes  : 

-  au  niveau  des  echanges  rapides  la  difference  est  Inslgnlflantc, 

-  dux  niveaux  Inferieurs,  l'echange  est  accompli  dans  It  delal  speclfle,  mais  avec  une  Indetermination  de 
positionnement  croissante  vers  les  niveaux  bas  due  aux  variations  de  charge  des  nlveaux  superleurs.  Ce 
^jitter"  croissant  generalement  sans  Importance,  peut  si  necessalre  etre  compense  par  le  mecanlsme  de 
datation  fine  expose  plus  loin. 

En  contrepartie,  les  echanges  non  urgents  equll ibrent  la  charge  du  bus  et  permettent  d'en 
exploiter  eff ica-ement  toute  la  capaclte. 

La  suppression  de  la  progrannatlon  des  echanges  permet  des  modifications  rapides  de  configuration 
du  systeme  conformement  aux  specifications  de  souplesse  souhaitees. 


Datation  fine  potr  le  calcul  des  vlell  1 issements  des  variables 

La  solution  efficace  a  ce  besoin  pour  les  variables  qui  le  necessltent  conslste  a  transporter  la 
date  a'echantlllonnage  dans  le  message  plutot  gue  d'accroltre  l"utilement  la  frequence  d'echange  par 
rapport  a  la  frequence  de  consommation  juste  necessalre,  Les  taches  utl I isatrlces  peuvent  alors  reactua- 
llser  speclflquement  les  valeurs  resues. 


24-5 


( 


Pour  cela,  une  heure  systeme  uniquement  utilisable  pour  les  calculs  de  retards  est  entretenue  par 
cheque  ISS  a  la  cadence  des  echanges  mots  sur  la  llgne.  Cette  heure  est  transmise  a  chaque  eoulpement  nar 
son  ISS  avec  une  perlodicite  speclfique  (par  exemple  toutes  les  0,5  ms,  une  frequence  trop  elevee  satu- 
rerait  i nut i  1  ement  1'acces  direct  memoire  de  1 'equipement). 


j 


\ 

| 

y 


l 


i 

t 


f 


I 


Synciironisation  mutuelle  des  echanges  et  des  taches  (figure  6) 

Les  lieux  de  production  et  de  consommat 1  on  des  variables  sont  des  taches  hebergees  dens  les 
equ’oements. 

Un  mecanisme  souple  par  evenements  references,  interruption  ou  mlse  en  filt  permet  de  realiser  la 
synchronisation  mutuelle  tathe-message.  Ce  mecanisme  est  le  suivant  : 

-  pour  remission,  le  message  prepare  par  une  tache  est  signale  "pret"  a  1  *  ISS  for  posit ionnemenc  d'un  bit 
d'etat  dans  une  table  de  64  bits  (pour  un  maximum  de  64  messages  elliglbles  par  equipement).  11  sera 
diffuse  au  plus  tot  pai  1' ISS  en  tenant  compte  de  son  niveau  de  priorite, 

-  pour  la  reception,  les  messages  et.ant  systematiquemem.  diffuses  en  mode  label  (c 'est-a-dire  avec  un  nom 
dans  un  mot  le  procedure  en-tete),  il  revient  aux  ISS  de  detecter  a  l'aide  des  parametres  inscrits  dans 
la  PROM  d'adaptation  de  1 ‘equipement  les  messages  qui  les  concernent.  L‘ instruction  de  gestion  fournie 
sur  32  bits  par  I'EPRCiM  permet  i  1  *  ISS  de  specifier  s'il  est  concern!  et  ce  qu'il  doit  faire  pour  charger 
’e  message  a  sa  bonne  place  puis  inserer  un  numero  d'ever.ei  >nt  avec  un  compte  rendu  d'etat  dans  une  file 
d'attente  de  1 'equipement.  Celui-ci  exploite  cette  file  a  son  rythme  en  activant  les  taches  en  attente 
sur  les  evenements  qu’elle  conti ent. 


Portee  d'adressaqe  a  structure  de  bloc  emboTtes 

La  portee  de  designation  des  labels  fournie  par  les  mots  de  procedure  est  structure  en 
3  niveaux  : 

-  un  niveau  de  commande  pouvant  comporter  256  valeurs  (dont  un  petit  nombrt:  seulement  est  utilise), 

-  un  niveau  de  labels  systemes  comportant  512  valeurs, 

-  un  niveau  de  labels  sonc-systeme  pouvant  comporter  jusqu'a  1?8  groupes  de  256  valeurs. 

Chaque  ISS  a  la  vision  complete  des  deux  premiers  niveaux  et  des  valeurs  d’un  seul  groups  sous- 
systeine  auquel  il  appartient.  Ce  niveau  correspond  aux  echanges  prives  entre  equipement.s  attaches  a  un  mime 
sous-systeme.  Les  echanges  a  ce  niveau  peuvent  etre  librement  modifies  sans  produire  d' interference  entre 
les  sous-systemes  pourvu  que  les  marges  de  charges  soient  respectees. 

La  structuration  avec  un  niveau  hierarchique  supple:.ientaire  est  envisageable. 

Outre  sa  souplcsse,  ce  dispositif  llmlte  ia  capacite  d'adressage  racessaire  au  niveau  de 
chaque  ISS  a  640  valours  permettant  de  limiter  la  capacite  de  l’EPROM  parametres  a  1  Kmots  (do  32  bits). 


SURETE  DE  F0NCT1QNNEHCNT 

El  1 e  s’appuie  sur  une  tres  grande  securite  de  detection  des  anomalies  de  fonctionnement  des  ISS 
qui  exige  leur  realisation  par  duplication  du  materiel  et  comparaison  des  sorties  (figure  2). 

les  ISS  realisent  ensemble  : 

-  la  val  idation  des  echanges  en  fin  de  cycle, 

-  le  filtrage  des  erreurs  du  a  des  transitoires, 

-  la  reconfiguration  du  bus  oc  leur  auto-reconfiguration. 


Validation  des  echanges  en  fin  de  cycle 


) 


I 


/ 

3 


i 

i 


j 

J 

1 

] 


J 

\ 

\ 


Au  fur  et  a  mesure  des  echanges,  chacun  des  ISS  elabore  un  code  cyclique  pour  I'ensemble  des 
1024  mots  du  cycle. 

Ce  code  est  insere  sur  14  bits  dans  ie  mot  de  controle  que  chaque  ISS  diffuse.  Lr  cycle  est 
valid!  quand  tous  ces  mots  de  controle  sont  identiques. 

Les  causes  de  pannes  etant  ir.dependantes  entre  Its  ISS,  et  ces  derniers  etant  parfaltement. 
testes  par  coupar»ison  grace  a  leur  structure  materiel  le  dupliquee  et  leur  exercice  continu,  la  validation 
du  cycle  faite  par  rhacun  d'eux,  completes  par  les  tests  de  parite  deja  realises  au  niveac  de  chaque  mot 
echange,  est  d'une  securite  pratiquement  absolue. 

Cette  validation  deverrouille  1' utilisation  par  les  taches  de  1 ' information  echangee. 


FJ ltrage  d *> s  erreurs  transitoires 

En  cas  d'erreur  constatee  une  premiere  fois  au  cours  du  cycle  ou  lots  de  sa  validation,  le  cycle 
est  affiche  er.  faute  le  mode  reprise  est  allume  et  Vensemble  des  ISS  bascule  sur  1’autre  llgne  (ou  rests 
sir  la  meme  en  cas  de  perte  de  la  deuxieme). 


24-6 


L'lSS  ce  p’us  petite  valeur  d'adresse  qui  n!a  pas  fait  dc  faute,  envoie  en  debut  de  cycle  sulvant 
un  message  d'inlt.alisation  des  parametres  courants  qul  tlent  en  quelques  mots  (heure  systeme, 
configuration  des  ISS  presents,  polnveurs  courants). 

L'ISS  en  faute  qul  avalt  debraye  du  c^cle  des  la  detection  de  Verreur  pour  atter.dre  ce  message 
est  alors  resynchronise.  Les  echanges  non  valides  malntenus  par  les  taches  emettrices  sont  ensulte  repetes. 
En  r,as  de  succes,  le  inode  reprise  est  efface. 


Reconfigur at  ion  des  ISS  ou  du  bu s 

«pres  trois  tentatives  infructuouses  du  meme  ISS,  ceTui-ci  s'eteint  automat iquement.  Ce  signal 
allume  un  ISS  de  seo.urs  qui  s’initial ise  sulvant  le  meme  precede.  S'il  n'y  a  pas  d* ISS  da  secours,  1'equi- 
pement  dispaiait  des  echanges  sur  le  bus. 

Quand  plus  d'un  mot  de  contrdle  se  separe  des  autres,  l'erreur  est  raise  au  compte  d'une  panne  de 
bus.  En  cas  de  faute  persistante  sur  le  meine  bus  physique,  celui-ci  est  abandonne. 


AIDE  A  LA  CONCEPTION 


Parall element  a  1'etude  et  destinee  a  faciliter  sa  raise  en  oeuvre  et  son  evolution,  I 'etude  d'un 
outil  de  concept icn.  est  en  cours  danc  sa  phase  theorique. 

Cet  cutil  CESAR  (3)  (Conception  Evaluation  et  Specification  des  Applications  Reparties)  est  un 
syst.eme  general  d'aide  a  la  conception  d’ apnl  ications  decoupees  en  "boTtes  noires"  &  hangeant  des  messages. 

II  perinet  de  deer  ire  : 

-  1 ‘architecture  iogique  :  decoupage  fonctionnel  en  taches  echangeant  des  Informations  sous  fc-mes  de 
messages, 

-  1 1  architecture  physique  :  decoupage  en  equipements  ayant  des  caracterlstiques  specifiques. 

II  permet  d'ajuster  le  niveau  de  detail  de  la  description  aux  besoins.  Une  telle  demarche  est 
bien  adaptee  a  une  conception  dnscendante  de  1‘ensemble  Cun  systeme  reparti,  et  permet.  d'offrir  les 
services  suivants  : 

-  la  verification  de  ia  coherence  des  specifications  fonctionnel 1 e  a  tous  les  niveaux  de  description,  soit 
une  validation  part  lei le  et  progressive  durant  toute  a  phase  de  conception, 

-  1'optimisation  de  V implantation  des  taches  sur  les  t.uipements  compte  tenu  des  specifications  logiques 
et  physiiv.es, 

■  la  documentation  automatique  du  projet  en  phase  de  conception  et  en  prase  de  developpemcnt,  I»  niveau  de 
'detail  etant  ajustable, 

-  une  model isation  roatheroatique  du  systeme  (lorsqut  le  niveau  de  detail  atteint  est  suffisant)  permettant 
d'etudier  au  mo.yen  d'une  ana.yse  statique  le  respect  Jes  lont-aintes  temporelles  et  la  correction  das 
specifications  de  synchronisation,  ainsi  qu'une  evaluation  des  performances  atteintes  en  fonction  de  la 
charge  du  systeme. 

Un  tel  systeme  permettra  de  specif1e>  les  .changes  qntre  les  equipements,  tant  du  point  de  vue 
logiqi.e  (nature  et  fonction  dei  informations  fecnangees)  qu>  du  point  de  vue  temporel  (duree  des  traite- 
merts,  contraintes  de  datation,  cor.tralntes  dc  sequential ii  e,  etc.).  II  permettra  la  verification  automa¬ 
tique  de  la  coherence  de  ces  specifications,  riri  cue  la  determination  de  1 '  implantation  optimale  des 
taches  sur  les  equipements  et  celle  de  la  repu^  tic  *ei  messages  sur  les  differents  niveeux  de  priorite. 

II  permettra  ecalement  de  vt*  ifier  lc  espect  oes  specifications  de  fiabilite  attachees  aux 
differentes  fonctions  d.  systeme,  cv>«p.  <*  ten  de  taux  de  defaillance  des  tessourtes  physiques  raises  en 
jeu.  Un  modele  markovien  permettra  de  s  <-e  i'e>  'lution  des  probability  des  differents  etats  du  systeme 
en  fonction  du  temps  suite  aux  pannes. 


CONCLUSION 


Les  prlnrip -s  ce  base  du  protocol e  ont  -ae  exposes  airisi  que  celui  sur  lequel  repose  la  surete  de 
fonctionnenent  et  pout  ’equel  >  ralcul  de  fiatr1  tc  previsionnelle  a  ete  realise. 

Aujourd'hui,  >  »tude  niveau  du  syseme  est  en  attente  d'une  progression  des  resultats  au 

niveau  oes  bus  optiques.  Au-deia,  une  maquette  experimentale  pourra  etre  realisee. 


REFERENCES 

(1)  SIERRA  :  Systeme  d' Integration  et  d'Echangy  Rei.rtl  et  P.econf  1  gurabl  e  automat  iquement. 

Rapport  de  synthese  -  Fevrier  '980  -  Contr...  ORLT 

(Z)  Programme  “Ville  cablee  de  8iarritz“  -  Malt. ise  d' oeuvre  SAT 

(3)  J.P.  QUEILLE.  The  CESAR  System  :  An  aided  design  and  certification  system  for  distributed  applications. 
The  Znd  ir.termtional  conference  on  distributed  computing  systems  PARIS  France  April  08-10  8C 


P.C  operations 


Bus  optique  dupliquS  _ — 1  Boltier  de  derivation  4  prises  par  local 


Bus  aiectriques  1  Mmots/s  (isolis  antra  aux)  ^  15  m 


PEUVENT  ETRE  CONNECTES  : 

-  CALCULATEURS 

-  OISQUES  RAPIDES 

-  IMPRIMANTES  RAPIDES 

-  ENREGISTREURS  A  BULLES  MAGNETIQUES 

-  V'SUS  CATHODIQUES 
ETC.  .  . 

Figure  3  -  SCHEMA  D’INTERCONNEXION 
SUR  RESEAU  OE  DIFFUSION  RAPIDE  0,5  M  MOTS/S  (10  M  RITS/S  ) 


Indie* 

nmsaga 


(Example  4  3  nivaaux) 

Figure  4  -  REPARTITION  EN  NIVBAUX  DBS  MESSAGES 


Figure  5  -  STRUCTURE  DE  LA  TRAME  DYNAMIQUE 


T acha  j 

- - 

1 

1 

Tach«  u 

PROM  d'odaptaHon 


Equipment" 


Bi 


Figure  6  -  SYNCHRONISATION  SNTRE  I’ISS  IT  R  EQUIREMENT 


iCRC  +  gastion 
i  64  mots 


|CRC  +  gattion 


1024  mots 


nivaau  gattion  : 
niveau  3 
nivaau  2 
nivaau  1 
nivaau  0 


-  Sur  un  nivaau,  la  but  art  attribu4  aux  ISS  par  ordra  d'adraaaaa  croitsanta* 

-  L'intarruption  an  favaur  da*  nivaaux  prloritaira*  art  prita  an  charga  darriira  un  cycia  da  gattion  (pointill*) 
Latanoa  <  2  mt 


Figure  7  -  TRAM! 


S5-1 


DISCUSSIONS 
SESSION  V 

REFERENCE  '0.  OF  PAPER:  V-20 

DISCUSSOR'S  NAME:  Or.  G.  H.  Hunt,  AVP  Member 

AUTHOR’S  NAME:  A.  D.  Stern 

COMMENT:  It  Is  an  Implicit  assumption  In  your  paper  that  the  occurrences  of  failure  In  the  different 
LRU's  are  completely  uncorrelated.  It  seems  to  me  that  there  may  be  some  mechanisms  for  failure  and 
degradation,  particularly  those  associated  with  variations  In  environmental  conditions,  which  are 
common  to  many  LRU's  and  which  could  correlate  to  some  measure  the  occurrence  of  failure.  Has  the 
author  been  able  to  satisfy  himself  that  this  Is  not  so? 

AUTHOR'S  REPLY:  I  agree  with  the  conment  entirely.  However,  common  failure  modes  should  be  designed 
out  of  the  system,  elsewlse  why  bother  with  redundz  tc.v.  The  stage-state  method  presented  Is  typical  to 
other  reliability  analysis  methods  1r-  that  It  assumes  randomness  of  failure  occurrence.  This  Is 
sufficient  for  architecture  reliability  trade  studies  and  comparisons  for  which  most  methods  are 
intended. 

There  Is  some  work  going  on  In  reliability  wnlch  addresses  different  failure  rates  for  recognition 
of  the  fact  that  most  failures  occur  during  near  "turn-on"  time,  and  that  failure  rates  for  a  given 
device  may  change  If  It  Is  In  some  standby  mode. 


REFERENCE  NO.  OF  PAPER:  V-22 

DISCUSSOR'S  NAME:  K.  A.  Helps.  Smith  Industries 

AUTHOR'S  NAME:  R.  L.  Schwartz 

COMMENT:  In  making  your  analysis  of  the  SIFT  system  depend  on  a  sequence  of  proofs  of  the  equivalence 
In  certain  respects  of  one  model  to  the  next  In  the  sequence  from  1-0  model  to  Pascal  program  (figure 
5),  have  you  made  any  estimate  of  the  likelihood  of  correctness  of  the  proofs  necessary  to  match  the 
extremely  low  probability  of  system  failure  required  (1  In  1010  per  hour),  and  Is  there  not  u  need  to 
have  (dissimilar)  redundancy  of  proofs  since  human  argument  Is  fallible  even  when  supported  by 
mechanical  aids  such  as  theorem  provers?  (Although  proof  is  either  correct  or  Incorrect,  confidence 
In  a  proof  is  generally  not  100%. ) 

AUTHOR'S  REPLY:  You  are  Indeed  correct  that  the  validity  of  the  overall  SIFT  verification  efforts 
depends  In  part  on  the  soundness  of  the  mechanical  theorem  prover  employed.  The  answer  to  this  Is  not 
however  to  replicate  the  proof  process  (with  perhaps  some  sort  of  majority  vote??)  or  to  use  a 
probabilistic  analysis  of  the  proof  validity. 

The  role  of  any  verification  attempt  Is  to  increase  confidence  in  a  system.  That  a  machine¬ 
generated  axiomatic  proof  Is  correct  merely  means  that  the  truth  of  the  proposition  follows  from  a 
glMi  set  of  axioms  and  rules  of  deduction.  This  by  Itself  Is  not  difficult  to  check,  either  by 
machine  or  by  a  human  reader.  This  Is  not  the  problem  area.  Determining  that  the  proposition  (or 
specification)  actually  expresses  a  property  sufficient  to  ensure  the  behavior  you  Intend  Is  the  more 
difficult  problem.  That  the  I/O  model,  the  highest  level  description  of  SIFT,  expresses  the  Intended 
system  function  In  the  end  must  be  determined  by  Inspection. 

I  do  not  believe,  however,  that  this  is  the  weakest  link  in  the  argument  of  SIFT's  reliability. 

To  me,  the  weakest  argument  concerns  the  reliability  assumptions  for  the  underlying  hardware  vrfiich  were 
made  in  order  to  employ  a  Markov  reliability  analysis.  Assumptions  such  as  the  statistical 
independence  of  faults  in  various  processors  and  that  their  occurrence  satisfies  a  negative  exponential 
distribution  appear  to  me  more  bothersome.  It  Is  here  that  further  work  should  be  done.  Eventually,  I 
also  expect  that  mechanical  theorem  provers  will  be  shown  to  be  consistent  with  a  spetlflcatlon-- 
pushlng  the  problem  back  one  more  level. 


REFERENCE  NO.  OF  PAPER:  V-22 
DISCUSSOR'S  NAME:  J.  H.  Saltzer,  MIT,  USA 
AUTHOR’S  NAME:  R.  Schwartz 

COWENT:  Is  It  your  experience  that  the  process  of  systematic  design  and  verification  Is  actually 
uncovering  design  errors? 

AUTHOR'S  REPLY:  Yes,  I  believe  the  formal  specification  and  verification  process  has  been  quite  useful 
In  uncovering  Incomplete  and  flawed  design  decisions.  Probably  tie  flaw  with  the  widest  significance  Is 
that  clock  synchronization  ca.  not  be  guaranteed  to  withstand  a  single  point  failure  using  triplication 
and  majority  voting.  In  response  to  this,  a  new  "interactive  consistency"  algorithm  was  developed. 

The  problem,  solution,  and  its  uroof  can  be  found  In  an  article  by  Pease,  Shostak,  and  Lamport  in  the 
Journal  of  the  ACM,  April  1980.  There  are  many  other  Instances  Involving  neglected  and  necessary 
constraints  on  the  schedule  table,  etc.  In  general,  formal  specification  and  verification  forces  every 
possible  state  of  the  system  to  be  considered  and  thus  Is  one  of  the  (If  not  the)  most  systematic 
analyses  possible. 


REFERENCE  NO.  OF  PAPER:  V-22 


DISCUSSOR'S  NAME:  Schwartz,  SRI,  USA 
AUTHOR'S  NAME:  Enslow  (Llvesey,  Presenter) 

COMMENT:  Is  your  proof  effort  primarily  oriented  towards  design  verification  or  towards  Implementation 
verification?  For  example.  If  I  accepted  your  figure  of  10*1°  for  fault  Independence,  but  asserted 
that  two  processes  would  deadlock  with  a  probability  of  10~9,  does  your  work  address  that? 

AUTHOR'S  REPLY:  The  proof  effort  extends  from  the  highest  level  design  specifications  down  through  and 
Including  the  Pascal  Implementation.  The  highest  level  model  specifies  that  the  system  must  continue 
to  apply  the  correct  task  function  on  the  correct  Input  values.  Any  Implementation  claiming  to 
Implement  SIFT  must  satisfy  this,  therefore,  lower  level  design  decisions  which  result  In  dr  .dlock,  for 
example,  will  thus  not  satisfy  this  requirement. 

I  might  conment  that  because  we  are  verifying  the  validity  of  employed  faul  t -tolerance  algorithms 
within  our  design  verification,  the  lowest  level  proof  of  Pascal  Implementation  Is  rather  trivial. 


REFERENCE  NO.  OF  PAPER:  V-<.3 

DISCUSSOR'S  NAME:  dim  McCuen,  Hughes  Aircraft,  USA 
AUTHOR’S  NAME:  M.  Szlachta 

COWENT:  Why  did  you  select  to  use  the  MIL-STD-1553  Bus  for  the  Intertie  for  the  computers? 

AUTHOR'S  REPLY:  We  had  to  chose  between  things  available  at  the  moment.  The  1553  Is  supported  by  many 
people.  We  are  not  fully  satisfied  with  It.  Probably  because  we  are  mlsuslnq  the  bus— we  are  using  It 
as  a  processor  link  and  had  to  change  It  a  little. 


REFERENCE  NO.  OF  PAPER:  V-23 
DISCUSSOR'S  NAME:  G.  Scottl ,  SELENIA 
AUTHOR'S  NAME:  J.  Szlachta 

COMMENT:  In  normal  operation  only  one  computer  has  the  possibility  to  access  memory  and  I/O,  while  the 
second  CPU  Is  maintained  hot.  If  the  crosscoupler  disconnects  the  nonactive  processor  from  the  memory 
and  the  I/O,  how  Is  It.  possible  to  compare  the  correct  operations  between  the  two  processors  In  absence 
of  correct  data  In  the  second  CPU? 

AUTHOR'S  REPLY:  The  crosscoupler  prevents  only  the  output  and  not  the  Input.  That  means  that  the 
output  of  the  active  CPU  Is  written  Into  both  memory  blocks.  The  line  In  figure  3  Indicates  the  active 
data  flow  only.  The  additional  data  flow,  for  example,  error  checking  Is  not  shown. 


REFERENCE  NO.  OF  PAPER:  V-23 
DISCUSSOR'S  NAME:  Horst  Klster,  VDO 
AUTHOR'S  NAME:  Szlachta 

COMMENT:  What  does  "synchronization"  <n  this  case  mean? 

AUTHOR'S  REPLY:  By  synchronization  we  understand  the  periodic  exchange  of  messages  between  the 
distributed  copies  of  a  systems-process.  This  exchange  serves  to  secure  the  consistency  of  a  subset  of 
the  global  data  space. 


REFERENCE  NO.  OF  PAPER:  V- 24 

DISCUSSOR'S  NAME:  Jim  McCuen,  Hughes  Aircraft  Co.,  USA 
AUTHOR’S  NAME:  M.  Meraud 

COMMENT:  Have  you  built  a  fiber  optic  bus  (100M  to  300MI  with  32  taps/drops?  If  so,  have  you  operated 
It  successfully? 

AUTHOR'S  REPLY:  Not  yet.  This  was  a  study.  The  ISS  Is  under  design  and  we  will  have  It  in  1982.  As 
for  the  optical  part,  It  Is  tlte  other  society  which  Is  In  charge  of  It.  We  expect  It  to  have  It  ready 
by  mid-1982.  And  a  prototype  should  be  ready  by  the  end  of  that  year. 


26-1 


PROTOCOL  LEVEL  MODULES  -  FOR  COST  EFFECTIVE  STANDARD  COMPUTER  COMMUNICATION 

0y vind  Hvinden,  Yngvar  Lundh,  0y stein  Sandholt 
Norwegian  Defence  Research  Establishment,  P  0  Box  25  -  N-200T  Kjeller,  Norway 


SUMMARY 


A  set  of  microcomputer  modules  for  implementation  of  network  front-end,  gateway  and  epeci- 
lized  host  computers  are  being  developed.  A  highly  modular  design  approach  is  taken.  One  or 
more  of  these  protocol  modules  may  be  interconnected  to  constitute  the  units  referred  to.  A 
"library"  of  tested  hardware  oub  modules  is  established,  new  modules  may  quickly  be  deve¬ 
loped  using  these  sub  modules.  A  framework  for  unified  prot.ool  implementation  and  protocol 
interconnection  is  defined.  This  includes  a  real  time  operating  system  kernel  with  functions 
for  buffer  management,  timing,  pseudo  parallel  process  execution  and  process  communication. 


1  INTRODUCTION 

An  experimental  distributed  computer  system  based  on  local  networking  is  under  development  at  the  Norwegian 
Defence  Research  Establishment  (NDRE).  This  effort  includes  development  and  investigation  of  different 
local  nets,  gateways  between  these  nets  and  existing  long  haul  nets,  host  computer  network  interlaces,  host 
computer  network  software  and  specialized  host  computers.  This  paper  concentrates  on  techniques  for  imple¬ 
mentation  of  "Network  Front-Ends"  (NFE),  gateways  and  specialized  host  computers  based  on  microprocessor 
technology. 

Local  network  technology  is  rapidly  developing  and  various  network  types  are  needed  in  our  experiments.  A 
network  architecture  that  permits  experimentation  with  different  nets  without  having  to  redesign  host  com¬ 
puter  network  interfaces  is  highly  desirable.  The  front-end  technique  hides  network  specific  details  and 
host  interfaces  may  be  standardized  independently  of  the  underlying  net.  This  technique  has  both  advantages 
and  disadvantages  compared  to  "non-intelligent"  hardware  interfaces  where  the  host  carries  out  all  protocol 
software.  Application  independent  protocols  that  are  commonly  UBed  by  all  hosts  are  protocols  well  suited 
for  front-end  implementation.  The  implementation  effort  for  these  protocols  may  then  be  reduced  to  one  pro¬ 
cessor  type.  Carrying  out  protocol  functions  by  a  front-end  may  offload  an  expensive  host  computer  signifi¬ 
cantly,  thus  providing  more  host  capacity  to  the  users. 

On  the  other  hand  a  front-end  may  be  a  throughput  bottleneck  for  hosts  with  high  performance  network  re¬ 
quirements.  Increased  packet  delays  may  also  occur. 

For  our  current  applications  a  front-end  technique  based  on  standard  microprocessor  technology  is  adequate 
in  terras  of  throughput  and  delay.  These  applications  are  narrow  band  digital  voice  terminals,  terminal 
interface  units  and  general  host  (mini-)  computer  networking  (remote  tn-ninal  access  and  file  transfer). 


2  ARCHITECTURE  AND  MODULARIZATION  CF  NETWORK  FRONT-END  GATEWAYS  AND  HOST  COMPUTERS 

The  ISO  reference  model  for  "Open  Systems  Interconnection"  (ISO  TC97/S'’  l6,  1979)  provides  the  framework 
for  protocol  implementation.  Protocols  are  hierarchically  layered  and  umuni cates  internally  only  with 

protocols  above  and  underneath  it  in  the  hierarchy.  The  hardware  and  i/.  l'uWare  base  for  protocol  implementa¬ 
tion  should  preferably  support  this  architecture. 

The  seven  layer  ISO  hierarchy  is  divided  into  three  main  parts,  network  layers  (1-3),  transport  control 
la^er  (U)  and  application  oriented  layers  (5-7).  The  transport  control  layer  and  layers  above  are  network 
independent.  Layers  1-3  (physical  link,  link  access  and  network)  are  used  by  all  hosts,  while  different 
transport  protocols  may  exist  within  a  network.  Choice  of  transport  control  protocols  depends  on  network 
community  (ARPA,  X.25,  ...)  and  application  (file  transfer,  speech,  ...)  which  may  have  different  require¬ 
ments  with  respect  to  reliability  and  delay.  The  US-DOD  ARPA  Transport  Control  Protocol  (ARPA  IEN-129,  1980) 
is  a  transport  control  protocol  for  extremely  relaible  ccxnmunication  in  all  imaginable  conditions.  Such  a 
protocol  may  be  "overkill"  for  local  communication  on  fast,  almost  error  free  nets,  and  a  solution  with  co¬ 
existing  transport  protocols  may  be  advantageous.  The  transport  protocol  shall  provide  reliable  service  to 
the  users  of  it,  that  means  a  reliable  path  from  the  transport  protocol  to  its  users  must  exist. 

The  network  layers  are  used  by  all  hosts  and  gateways  and  are  therefore  obviously  well  suited  for  front-end 
implementation.  The  transport  protocol  may  also  advantageously  be  implemented  in  the  front-end  if  a  reli¬ 
able  host-front  end  interface  is  obtainable.  Transport  control  protocols  are  often  very  complex  and  resource 
demanding  both  to  implement  and  execute. 

Protocols  above  the  transport  layer  are  inherently  host  specific  and  in  general  not  suitable  for  front-end 
implementation . 

A  host  computer  may  be  connected  to  the  front-end  by  various  types  of  interfaces,  parallel  or  serial.  Paral¬ 
lel  interfaces  are  usually  the  best  solution,  they  are  fast,  reliable  and  they  represent  little  host  pro¬ 
cessing  overhead,  especially  if  they  are  DMA  driven.  Serial  interfaces  provides  a  "clean"  interface  and  per¬ 
mit  greater  distance  between  host  and  front-end  than  high  speed  parallel  interfaces.  A  serial  interface  is 
usually  used  with  a  link  access  protocol  that  provides  reliable  flow  controlled  service. 


26-2 


We  may  now  conclude  that  several  combinations  of  hoet  interface  and  protocol  level  for  interfacing  are  func¬ 
tionally  equivalent  and  of  current  interest.*  Which  combination  to  aelect  depends  on  host  type,  network  type, 
application  and  economy.  No  method  is  obviously  beat  for  all  purposes.  It  means  that  a  flexible  front-end 
design  that  permits  uso  of  different  nets,  different  host  interfaces  and  a  variable  number  of  protocol 
levels  is  beneficial. 

According  to  the  layered  structure  in  the  ISO  reference  model  a  modular  layered  front-end  design  approach 
is  sensible.  We  decided  to  implement  one  or  more  protocols  on  dedicated  boards  and  interconnect  these 
boards  to  constitute  tne  units  to  be  developed.  These  boards  are  referred  to  as  "Protocol  Level  Modules" 
(PLM)  or  ,iust  "protocol  modules". 

A  protocol  module  has  at  least  two  interfaces,  upper  and  lower.  When  more  than  one  protocol  module  is  needed 
to  constitute  a  unit,  the  protocol  modules  in  the  <init  communicates  internally  by  a  "Inter  Level  Interface" 
(ILI).  A  protocol  module  that  interfaces  to  a  host  has  a  "Host  Spacific  Interface"  (HSI).  The  ILI  ia  stan¬ 
dardised  while  the  HSI  may  differ  dependenting  on  host  type.  The  lcweat  level  protocol  module  haa  a  Netvork 
Specific  Interface  (NSI)  and  protocol  modules  carrying  out  applications  may  have  an  "Application  Specific 
Interface"  (ASI),  Since  protocol  modules  are  tingle  board  units,  several  types  have  to  be  developed  to 
cover  network  and  interface  combinations.  This  concept  is  not  practical  if  extremely  meny  such  combina¬ 
tions  were  needed.  A  multi-board  processor  design  may  then  be  a  better  solution.  The  following  examples 
will  show  bow  such  modules  may  be  used  in  front-ends  as  well  as  in  gateway  and  specialized  host  computers. 
Figure  2.1  rhows  3  front-end  configuration*  which  wa  are  implementing. 


HOST  COWUTtH  C 


Figure  ? . 1  Host  network  front-end  interface  exa 


1Yie  low  level  protocol  module  contains  the  local  network  protocols  used  by  all  netvork  stations.  Host  "A" 
ia  connected  to  the  front-end  by  the  HSI  at  the  network  level.  The  HSI  ia  a  parallel  interface  in  this  esse. 
Host  "B"  ia  connected  by  a  similar  interface  at  the  transport  protocol  level.  Two  protocol  modules  are  used 
in  this  front-end,  interconnected  by  the  ILI.  Host  "C"  ia  connected  at  the  network  level,  *.  X.25  link  access 
and  physical  link  constitutes  the  HSI  here.  Scope  of  X.25  is  limited  to  host  front-end  access  in  this 
example . 

In  Figure  2.2  protocol  modules  are  used  in  two  gateway  configurations.  The  gateway  between  A  and  C  are  con¬ 
stituted  of  protocol  modules  throughout,  one  network  module  for  each  net  with  the  gateway  protocol  in  a 
module  in  between.  Betveen  similar  nets  this  design  in  simple  and  straightforward.  Gateways  between  very 
different  nets  may  be  more  complicated,  and  more  powerful  computers  may  then  be  needed.  A  (e  g  mini-)  com¬ 
puter  already  interfaced  to  an  existing  net  may  be  converted  to  a  gateway  betvetn  two  nets  by  interfacing 
it  to  a  front-end,  connected  to  the  new  net.  The  gateway  between  network  B  (long  haul  store  forward  net) 
and  network  C  exemplifies  this  design  , 

Specialized  host  computers  are  machines  that  are  dedicated  to  special  purposes.  Typical  examples  are  termi¬ 
nal  interface  units  for  remote  terminal  access  to  host  computers  and  spesch  terminals  for  digitized,  packe- 
tized  speech.  The  number  needed  of  such  units  may  be  relatively  large  and  cost-effective  solutions  are  im¬ 
portant.  Figure  2.3  shows  these  two  hosts  built  completely  of  protocol  modules.  The  speech  host  haa  inter¬ 
faces  to  s  speech  digitizer  (vocoder)  and  key-pad/display  for  connection  control  and  status,  the  terminal 
host  has  standard  RS-232  interfaces. 

We  have  now  discussed  a  system  architecture  based  on  protocol  modules  with  several  examples  of  use,  both 
potential  and  currently  implemented.  Before  going  further  into  design  details,  two  issues  regarding  this 
architecture  will  be  addressed. 

Firstly,  the  number  of  protocol  modules  "stacked"  to  constitute  a  unit  should  be  kept  low  to  keep  both  cost 
and  packet  (traffic)  delays  low.  The  ILI  design  is  particularly  important  here,  and  delays  may  be  kept  very 
low  if  microprocessors  with  block  transfer  capability  or  "Direct  Memory  Access"  (DMA)  are  used. 


;6-3 


Figure  2.2 


Use  of  protocol  modules  in  gateways  ,  an  example 


TERMINALS  (TTY,  VOU  PRINTER  ....I 


Figure  2.3  nrotocol  module  bused  specialized  host  computers 


Secondly,  using  more  than  one  protocol  module  permits  "tailored"  implementation  of  protocols.  Low  level 
(e  g  link)  and  application  protocols  (e  g  terminal  controller)  often  needs  very  fast  and  efficient  inter¬ 
rupt  handling,  vhile  medium  level  protocols  (e  g  network  and  transport  control)  beneficially  are  implemen¬ 
ted  under  an  operating  system  kernel.  Meeting  bovh  these  requirements  with  one,  processor  means  compromises 
and  more  complex  software. 


3  PROTOCOL  MODULE  HAFDWAPE  INTERCONNKC TIOH  AND  ARCHITECTURE 

Based  on  the  sy3tsm  archi .ecture  discussed  a  set  of  protocol  modules  nave  been  developed.  The  following 
ma'.r.  requirements  were  sst  es  design  goals: 

-  Protocol  modules  ure  single  board  microcomputers 

-  It  must  be  possible  to  use  different  microprocessors  as  protocol  module  CPU,  both  "8  and  16  bitters" 

-  A  standard  inter  module  interface  for  fa3t ,  efficient  inter  module  pa  ket  exchange  should  be  defined. 
This  interface  must  not  lock  CPU's  tiyhtly  together  for  long  times  during  transfer 

-  Flexible  combination  of  RAM  and  EPROM  memory  should  be  possible 

-  The  concept  should  allow  exploitation  of  new  powerful  microprocessors  and  associated  circuitry  as  it  be¬ 
comes  available. 

-  Initial  series  of  protocol  modules  should  serve  as  experimental  tools 

A  standard  inter  module  interconnection  had  to  be  defined.  This  interface  is  a  very  important  part  of  the 
system,  and  several  solutions  were  analyzed  before  a  decision  was  made.  The  interface  must  be  parallel,  in 
order  to  be  fast  and  simple.  The  data  transfer  path  must  be  8  bits  wide  to  accommodate  both  8  and  l6  bit 


f, 

l 


K 


processors.  It  must  be  fast  e  o  oh  e  JMA  na..s  atco  .  Mbyte/s'*  a-r  ’  asynchronous  ac  that,  fast  m»d 
slow  processors  may  comruni'*  •'  * i tnor*  •  n  ;  each  other  up.  The  inte-  e  must  hide  internal  memory  orga¬ 
nisation.  That  means  *'4  wc  rt  »emory  ilutio  re  onRcceptable .  A  .gn  base  on  fifo  circ  ts  was  se¬ 
lected.  Packets  of  arb  ti  lc  .gi  ki  are  trann  re  -  as  one  or  more  s  meats  unde-;  software  control  through, 
this  interface,  read  »  .id  te  on*  atious  ar  -.parated  o  ‘  1  and  are  completely  asynchronous.  The  fifo 

array  capacity  Is  IQ?  bytt  Fi,.  0  s  w.  *.u-  >»  oe.  .nlcs  with  t.nis  interface,  named  ILI. 

Tne  current  1  rap"1  “me  ntion  \r-  baseo  >1*  x  i  •  1  To  chips  and  hS-Tf  -out  r*  i  an  i  stat  us  cir-uitry. 

21  chips  are  us  l  '  11  r.  j>c-  r<l  ren  Jt  ite  is  occupied  for  ~  unplete  noper  and  lower  ILI.  ’hip 

count  may  1  »•  n  ic -  »  ;or  :■  ably  ii  nture  Lgi.s.  12^  '  hit  'it  '  chips  complete  with  control,  status 
and  bus  inter!  *  circu  ••  will  be  ov.lIh*  *>om  micro  *0  *ssor  mnnufu  turers  in  1981.  Two  h'.  pin  chips 
can  then  r*»pla  1  of  present  oner;.  Tne.ie  *w->  Lnte>  ra  $  vn  i<»t  i>e  direct-  *  compatible. 

Hardwar  rrt  re  tee-.le  '  by  •  protot  .3  are  CPU  a  l  memory.  An.  type  of  (’PC  may  oe  usf  1  with  this  con¬ 
cept  1  e  •  >g  A- CP"  ias  ?en  used  1  the  nr^  0  design.  A  inter/timer  circuit  is  needed  in  many 

prof  jco  !  s  il  i "  'luvi  t  as  a  standar;  devie*  .  '  CP1)  must  be  ^0"  to  star-  when  power  is  switched  on, 

initialize  tsel^  1:  beco’f,?  svuch '-oni  zed  wit:,  riei,  hour  r  iule  ;  before  useful  processing  may  t<.Jte 
"he  r *  .  vm  may  v  »  parti-  >r  m-np  o.ely  i  EPROM.  A  L*»r3  e  memory  structure  is  needed,  memory  should  be 
>  in  increments,  AM  a’  ‘i’P.OM  r*  urement  a*  1  I’ficult  o  nredic  and  pin  compatible  'AM/EPROM 

are  *her^,*ore  ised  C  Letelv  nagged  'rorramu  (if  sue  **xist  can  be  permanently  stored  in 

•'■'M.  T>ur'r*.  le  -lopmrvit  it  i'  lighl.  r’ul  to  ivnloa*  the  program  »nto  RAM,  only  one  bootstrap  EPROM 

1  trim  ne«*  b*d.  It  is  ‘xpeef*  to  v-'»  eu  us'  n‘.  -  r-nal  a°  irns  for  operational  use,  to  hav«  protocol 

urograms  in  El 'ROM. 


our  pr  >toco'  **  nave  b^en  ’.igned  yure  3.  shows  *■  ■  i  r  hardware  ar  -hi  teot-  /e .  j 

All  protocol  mod  ■  es  on  V  *ure  i.P  c<  tain  the  "’’J  and  as:>  •;  m,e  i  ^esel,  ui  fer  an  1  clock  circuitry  and  the  i 

first  1’  Kbytes  *  ock  of  emory .  MUn  msal"  Y.  both  upper  and  low  -  r  TLi  and  kbytes  of  memory.  "HDLC"  j 

has  32  uytes  memory  upper  »»nd  '-*<  •  ILI,  .  multi  "rococo*  link  level  controller  chip  (/,  ’^A-STO)  with  j 

X.:,cj.l  .Pl/El/  :i-h22  ne  tra-  -  -  -3 ,  bit  -i-r%  venerating  cir  .it  r/  and  a  1'  bit  LE  L  display  for  debug  ■ 

.>  .epos-  ..  The  >LCm  board  mc  /  *»•  •  bo^  or  ),  •  i  bottom  of  a  layered  nrot  *col  module  package  since  it  3 

nas  u  sr  and  ^er  IL  .  "TA  **  :tHinn  :  to  a  narrow  band  digits'  voice  vocoder,  lower  1  Li 

ind  1  kbytes  *f  memor-.  A  xo .et  f  the  * t?-nal  I  )  bus  is  male  accessible  for  1/0  devices  not  practical 
to  1'  ite  '»■  se  APA;  boa'  •  (d  -.  ai  .ay  ans  ey-nad) .  "RIUO-NTT"  c  ontains  an  interface  to  a  commercially  j 

vai  ujle  1  bi*  *.  ?  ig-net.  ,w>n- f*o)  le’  **  kbytes  ?f  memo•,^/  and  upper  ILt  .It  contains  two  dual  port  buf-  j 

er  emorie  r  u  kb-  *s  each  rtr’  Im--.  u.eous  Id  ’  oi  t/s  ring  send  and  receive  operation.  . 

,-a]  of  ■**odu1  ■  ore  p  i »-  oei  w  .•  'ariour.  nost  and  1  implication  specific  upper  interfaces  (e  g  standaru  j 

’r.hr,-*Vdd  O'  ,  K'A  '  -'M2'  ar  ;  Tthe’"  r-e ‘.work  is  erfaces  (e  g  Ethernet).  j 

A  A  “hi-  iil  — <»a:  »*  tjaies  rnr  some  of  these  me  lules ,  such  a  device  should  be  in-  ) 

oret»-  /ner.  r-l‘»tirr  *.  *  lone,  data  packet  transfer  between  memory  and  j  LT ,  and  memory-memory  1 

*  •  uns  fers  >w  ^  umes  *;  “niriran*  :  artr  of  available  CPU  cycles.  A  /.80A— DMA  performs  this  5  times  faster 

’  -  1  r,  th»‘  )iV  *  :  >t- . . .  i  i-tf  ■  — ti«v  -  -  • 

1  I 

i  i 


Em 


26-5 


PWOTOCOL  MODULE  "UWIMUl- 


PKOTOCOl  MODULE  "HDLC 


woToroi  hoouk  wore  mosMi" 


noiocoi  MODULI  «W«' 


figure  3, ?  Protocol  module  block  diagram 


It  THE  HARDWARE  MODULE  LIBRARY 


An  interactive  layout  system  was  used  Tor  the  pc-board  artwork  design.  The  protocol  module  family  has  a 
modular  architecture  where  certain  sub-modules  are  used  on  many  boards  in  the  form  of  "library  elements" 
in  the  layout  system.  Such  submodules  are,  e  g  CPU  and  associated  circuitry,  upper  and  lower  ILI  and 
memory,  see  Figure  4.1.  Figure  4.1  shows  segment  layout  of  the  4  boards  developed. 

Sub-modules  are  located  on  the  same  place  on  all  protocol  modules.  A  new  board  is  designed  by  picking  sub- 
modules  from  the  library  and  placing  them  in  an  aggregate  layout.  Sub-modules  which  have  been  used  before 
reduces  development  time  of  new  modules  substantially. 


26-6 


Figure  L.l  Protocox  module  segment  layout 


5  PROTOCOL  MODULE  SOFTWARE  ARCHITECTURE 

A  unified  3i  ftware  framework  has  been  developed  for  structur’d  and  flexible  protocol  module  interconnection. 
This  framework  supports  communication  between  various  software  modules  within  a  unit  constituted  of  more 
than  one  protocol  module.  Such  software  modules  are  communication  protocols  and  distributed  functions  for 
network  debugging,  network  experiments  and  network  maintenance.  A  typical  example  of  protocol  module  and 
protocol  interconnection  structure  is  shown  on  Figure  5.1. 

The  host  computer  has  various  protocols  that  uses  different  protocols  in  the  front-end  package.  'T'he  lines 
between  the  protocols  symbolizes  logical  communication  paths  between  them.  We  have  defined  a  framework 
based  on  tni3  structure.  The  abstract  term  "logical  channel"  is  central  in  this  framework.  A  logical  chan¬ 
nel  is  a  one  way  communication  path  between  software  modules  within  one  protocol  module  or  between  software 
modules  in  protocol  modules  that  are  neighbours  in  a  hieararchy  of  layered  protocol  modules.  Messages  of 
finite  length  are  exchanged  on  these  channels  and  a  rule  for  flow  control  is  defined.  A  pool  of  logical 
channels  is  defined,  this  pcol  is  divided  into  two  main  blocks,  internal  and  external  channels. 

The  external  block  is  divided  into  two  blocks,  upward  and  downward  channels.  Figu-e  5.?  shows  logical  chan¬ 
nel  allocation. 

As  protocol  development  continues  protocols  will  be  assigned  fixed  external  logical  channel  numbers,  but 
this  is  not  part  of  the  framework. 

The  logical  protocol  for  multiplexed  raeosage  exchange  ib  the  high  level  part  of  the  ILI  and  the  HSI  pre¬ 
viously  described.  The  low  level  part  of  these  standards  are  the  interface  hardware  and  associated  software 
drivers.  The  logical  part  of  the  protocol  are  independent  of  the  hardware  and  driver  part,  which  are  sub¬ 
ject  to  changes  and  new  implementations. 

We  have  defined  two  versions  of  the  logical  protocol,  full  and  simplified  ILI/H3I.  Roth  depend  on  IPOS 
reliahle  interface  hardware  and  driver  service.  The  difference  is  the  flow  control  scheme  used.  The  full 


I 

F 


i 

[ 

s 


iM-mi 

1:1 

(Ill-Ill)  j 

/TS  ^  i 

i 

i 

T 

urn*  in 

UW*N  OUT 

\ 

CHANNELS 

CHANNELS 

\ 

1 

1  INTLANAl  CHANNELS 

1 

IS  VI 

! 

LOMU  OUT 

LOW*  A  IN 

i 

CM AN  IK  IS 

CH/UlNELS 

! 

> v  4' 

! 

(«  \71) 

Illl-mi 

Figure 


Logical  channel  block  allocation 


simplified  version  does  not  utilize  these  "request  messages",  and  a  sending  '  process"  may  overflow  the  re-  5 

ceivinp,  "process"  so  thet  the  interface  becomes  temporarily  blocked  until  the  receiving  "process"  is  able 
to  accept  a  new  message, 

J 

Message  exchange  can  only  take  place  on  logical  channels  known  in  both  protocol  modules,  messages  sent  on  ‘ 

channels  not  allocated  to  specific  "processes"  end  uu  in  a  "black  hole"  without  any  further  notification 
to  the  sender  "process". 

} 


6  STANDARD  SOFTWARE  MODULF.S 

Certain  parts  of  the  protocol  module  software  are  similar  on  many  protocol  modules.  1/0  drivers  are  a  typi¬ 
cal  example.  Most  protocols  need  functions  for  buffer  memory  management,  message  exchange  and  timing.  Pro¬ 
tocols  above  level  1  are  implemented  as  real  time  programs  and  several  protocols  may  share  one  CPU  for  exe¬ 
cution.  Some  protocols  are  so  complex  that  they  must  be  divided  into  sub-modules  that  are  to  be  treated  as 
separate  unita  for  execution.  A  real  time  operating  system  that  supports  pseudo-parallel  execution  of  such 
units  or  processes  would  simplify  protocol  implementation  considerably.  Such  an  operating  system  kernel 
imposts  some  processing  overhead  and  is  not  well  suited  for  protocol  imolementations  with  extreme  response 
requires  nts  (e  g  link  level). 


26-8 


creation,  buffer  management,  process  ccmaur.ication  anu  p:ocess  scheduling.  The  process  communication  sys¬ 
tem  is  the  logical  channel  system  previously  described.  It  supports  both  internal  and  extc:*nal  logical 
channel  commuucation.  This  package  is  constituted  of  an  inner  kernel  and  a  number  of  kernel  processes 
that  executes  concu. r«ntly  with  processes  carrying  out  the  communication  protocols.  The  ILI/HSI  are  imple¬ 
mented  as  kernel  processes  which  have  priority  over  protocol  processes. 

Tne  kernel  is  now  being  used  in  an  APPA  TCP  implementation  effort  which  will  need  all  kernel  functions.  The 
kernel  design  is  not  finished  and  frozen,  we  expect  to  modify  it  to  new  needs  and  to  modify  it  for  new  pro¬ 
censors.  It  is  written  in  the  Zilog  PLZ/SYS  system  implementation  language  (Snook,  T  1973)  which  allows 
migration  to  16  bit  processors  with  little  conversion  effort. 


7  CONCLUSION 

A  system  has  been  described  for  ’’host  independent"  implementation  of  communication  protocols.  References 
have  been  made  to  a  preliminary  design  of  such  modules  for  experimental  purposes.  We  believe  that  "network 
front-ending"  using  such  a  modular  approach  may  have  merit  in  future  computer  communications.  Some  of  the 
reasons  are:  Maintenance  and  further  development  of  the  front-ended  protocols  becomes  independent  of  the 
various  host  computer  types.  The  work  load  on  the  host  can  be  reduced  substantially,  and  could  be  impor¬ 
tant  for  economy.  Practical,  application  oriented  implementations  may  exploit  the  expected  further  develop¬ 
ment  of  more  powerful  circuit  technology,  both  microprocessors  and  more  specialized  circuits. 

The  most  obvious  limitations  of  "front-ending"  are  associated  with  the  delay  of  packets  travelling  through 
the  'ront-end.  Further  study  of  details  of  these  limitations  are  under  way.  These  investigations  will 
establish  quantitative  factors  for  throughput,  delay  -  and  on  the  other  hand  circuit  and  program  perfor¬ 
mance  requirements  for  various  situations.  Certain  applications  with  extreme  performance  requirements  will 
probably  still  be  better  ved  by  utilizing  host  computer  capacity  for  the  protocol  logic  -  in  the  con¬ 
ventional  manner. 


References 


AH  PA  IEN-129-  1980,  "DOU  Standard  Transmission  Control  Protocol",  Arpa  RFC :  T6l  IEN-.129,  Prepared  for 
Darpa  by  Information  Sciences  Institute,  University  of  Southern  California 

ISO  TC971/SC  16,  1979,  "Reference  model  of  Open  Systems  Interconnection",  IS0/TC97/SC  16  Working  Document 
L27 

Hvinden  0,  1981,  !'The  Paradis  Kernel  Software  Package",  FFI/NOTAT-0I/7O35,  Norwegian  Defenc  Research 
Establishment 

Snook  T,  Bass  C,  Roberts  I,  Nahapetien  A,  Fay  F,  1978,  "Report  on  the  Programsing  Language  PLZ/SYS, 
Springer  Verlag 


27-1 


les  Strategies  de  retransmission  pour  le  contrSle 
d'erreur  dans  les  protocoles  de  transfert  de  donnEes 

GUY  JUANOLE 

LABORATOIRE  d '  AUTOHATIQUE 
et  d' Analyse  des  Syetimee  du  C.N.R.S. 

7,  ai*nue  du  Colonel  Roche 
21400  TOULOUSE  -  FRANCE 


RESUME 

Ce  papier  consiste  en  :  d'une  part,  la  presentation  generale  du  transfert  de  donr.des  (paquets)  A  travers 
une  ligne  de  transmission,  compte  tenu  d'un  contrdle  d'erreur  basd  sur  la  detection  d'erreur  et  la  retransmis¬ 
sion  aprds  detection  d'erreur  ;  d'autre  part,  la  definition  et  la  presentation  de3  different* 3  strategies  de 
retransmission. 


La  presentation  generale  du  transfert  de  donnees  est  basee  sur  un  moddle  hierarchise  4  plusieurs  niveaux 
oil  ohaque  niveau  utilise  les  services  du  niveau  infdrieur. 

Cette  approche,  essentielle  pour  une  bonne  visualisation  des  diffdrentes  fonctions  ndcessaires  pour  ce 
transfert,  permet,  en  outre,  de  bien  distinguer  deux  niveaux  dans  le  contrdle  d'erreur  : 

-  un  niveau  supdrieur  relatif  A  un  contrdle  sur  l'arrivee  des  paquets  (mise  en  oeuvre  de  mdcanismes  de 
numdrotation  des  paquets,  de  rdponses  aux  paquets  numdrotds  et  de  retransmission  des  paquets  numdrotds 
non  arquittds)  , 

-  un  niveau  infdrieur  relatif  A  un  contrdle  sur  le  contenu  des  paquets  numdrotds  et  des  reponses  (mise  en 
oeuvre  de  codes  ddtecteurs  d'erreurs). 

La  consideration  de  ces  deux  niveaux  est  essentielle  pour  ddfinir  clairement  et  prdcisdment  les  dif- 
fdrentes  strategies  de  retransmission.  Nous  ddfiniS3ons  deux  classes  de  strategies  :  la  classe  1  od  la  retrans¬ 
mission  est  due  uniquement  A  une  tempo  ri  sat  ion  qui  est  imp  ldmentde  dans  le  niveau  supdrieur  ;  la  classe  2  oil 
la  retransmission  est  dgalement  mise  en  oeuvre  suite  aux  erreurs  ddtectdes  par  le  niveau  infdrieur  ,  que  ce 
dernier  signale  au  niveau  supdrieur. 


Dans  chaque  classe,  nous  ddfinissons  les  dif.fdientes  strategies  qui  rdsultent  des  diffdrentes  modalitds 
possibles  pour  : 

a)  1' envoi  des  paquets  numdrotds  par  la  source  de  ces  paquets  numdrotds, 

b)  1 ' acceptation  des  paquets  numdrotds  par  le  puits  de  ces  paquets  numdrotds, 

c)  la  manidre  dont  le  puits  accuse  reception  des  paquets  numdrotds  acceptds. 

Cette  presentation  des  diffdrentes  strategies  de  retransmission  nous  apparatt  comme  une  premidre  dtape 
essentielle  avant  d'effectuer  leur  moddlisation  formalle  en  vue  d’une  part  d'une  verification  de  leur  validitd 
logique  et  d'autre  part  d'une  implementation. 


Liste  des  symboles  utilises 


\i 

TEMP 


lV 


MT 

Hi 

Mi 

[w*  (ACC)] 

[rp  (ret)]  j 
[rnpq] 

[rrp]3 

[Tl3 
CSE]i 
[se] 


h- 

Processus  de  niveau  x  (x 
TEMPorisation 


1,2, 3, 4, 5)  dans  le  calculateur  C 


Machine  de  Transmission 

Paguet  dmis  par  et  destind  A  P^,. 

> 

41 


Nuiu6rot6  6rais  par  P*,  et  destine  &  P 


4j 


RdPonse  Praise  par  P.  et  destlnee  a  P., 

—  —  4i 

[rp]  ^  qui  est  un  ffiCusg  3e  reception 
[rp]^  qui  est  une  demande  de  ^Transmission 
[NPQ] ^Redondant  6iois  par  p^  et  destin6  &  P^j 
Jrp]^  Redondant  4mis  par  et  destine  &  D3^ 
Trame  s6rie  4mise  par  et  destinSe  ^  ^2j 
Trame  s£rie  £mise  par  et  destin#e  ^  F2i 
Signal  d*Erreur  4mis  par  P^  et  destine  £  P^ 
Signal  d'Erreur  6mis  par  P^  et  destine  & 


pluriel  de  [pg]^  ^RP] 


INTRODUCTION 

La  fiabilitd  des  applications  distribudes  (qui  se  multiplient  actuellement  compte  tenu  du  ddveloppemcnt 
des  systdmes  de  calculateurs  gdographiquement  dlstribuds)  depend,  en  particulier,  de  la  fiabilitd  du  * .ansfert 
des  donndes  au  moyen  des  lignes  de  transmission,  ce  qui  donne .done, toute  sen  importance  au  contrdle  d'erreur 
appliqud  A  ce  transfert.  D'une  manidre  gdndrale,  ce  contrdle  d'erreur  est  basd  sur  les  principes  suivants  : 
detection  d'erreurs  et  retransmission  aprds  Idtection  d'erreurs. 


—  H.VkCTY^l  .  7<1 


27-2 


Le  systdme,  qui  sert  de  support  4  notre  analyse,  conslste  en  deux  calculateurs  et  c  ,  situds  dans 
deux  sites  distants  et  connectds  au  moyen  d'une  liaison  point  4  point.  Nous  appelons,  paquet.la  structure 
de  donnde  4  etre  dchangde  entre  et  Cj  4  travers  la  ligne  de  transmission.  Nous  considdrons  untquement 
le  cas  du  transfert  unidirectionnel  de  paquets  ;  en  effet,  les  principes  des  strategies  de  retransmission 
de  ce  type  de  transfert  se  retrouvent  dans  les  transfer ts  bidirectionnels. 

L'analyse  effectude  comprend  deux  parties  :  dans  une  premiere  partie,  nous  reprdsentons  le  transfert 
de  donndes  par  un  moddle  hierarchise  ce  qui  nous  permet  de  visualiser  les  deux  niveaux  du  contrdle  d'erreur 
et  les  classes  de  strategies  de  retransmission;  dans  une  deuxidme  partie,  nous  definissons  les  differentes 
strategies  de  retransmission. 

J.  M0PE1P  HIFt^RCHISE  V U  TRANSFERT  PE  PONNEES 

J . 1  PA.iientation  qintncULe.  da  iyit&mi  cnniddfAl 

Le  systSme  considere  est  represente  sur  la  figure  1  : 

-  represente  4  la  fois  les  processus  d' applications  de  Cj  et  les  processus  necessaires  pour 
les  structures  de  donnees  de  ces  applications  en  paquets  4  envoyer  vers  Cj  ;  P,. .  represente  4 
processus  d 1  application  de  C.  et  les  processus  necessaires  pour  transformer  les  paquets  regus 
tures  de  donnees  signif icatives  pour  ces  applications, 

-  les  processus  des  niveaux  4,3,2  et  1  mettent  en  oeuvre  los  principales  fonctions  necessaires  au  transfert 
de  donndes  (sequence  de  paquets)  depuis  C^  vers  C^,  4  travers  une  ligne  de  transmission  (niveau  0). 

La  decomposition  multiniveaux  amdne  4  distinguer  deux  types  de  protocoles  :  des  protocoles  relatifs 
4  la  cooperation  entre  processus  distants  qui  sont  dans  un  meme  niveau  que  nous  les  appelons  des"protocoles 
de  niveau"  ;  des  protocoles  relatifs  4  la  cooperation  entre  des  processus  dans  deux  niveaux  adjacents  d'un 
calculateur  que  nous  appelons  des"protocoles  entre  deux  niveaux  " (  un "protocole  entre  deux  niveaux"  est  carac- 
tdrise  par  un  ensemble  de  primitives  relatives  au  service  qu'un  niveau  demande  au  niveau  immddiatement  inf  d- 
rieur)  .  Nous  ne  considdrons  pas  ici  les'protocoles  entre  deux  niveaux." 


transformer 
la  fois  les 
en  struc- 


Nous  ddcrivons  maintenant  les  principales  fonctions  des  protocoles  des  niveaux  4,3,2  et  1  en  insistant 
plus  particuliSrement  sur  les  niveaux  4  et  3  qui  mettent  en  oeuvre  le  contrdle  d'erreur. 

1.2  Led  pAo-tocoled  ded  ru.ve.aux  4.3,2  c. £  I 

1.2.1  P4d<oco£e_du_nx.vcdu_4 

Ce  protocole  a  pour  but  de  contrdler 
peuvent  perdre  des  informations  qui  y  transitent) 
vants  au  moyen  du  niveau  3  :  des  [NPQ^j^  de  v  vers  P 


contrdle  les  fonctions  de 


P„.  et 
4i 


( ont 


si  les  [PQ  1 
tent)  .  Afin  ai 

4  j 


arrivent  4  leur  destination  (les  niveaux  infdrieurs 
e  rdaliser  ce  contrSle,  on  a  done  les  dchanges  sui- 


et  des 


RP  I  , 

sj  j 


de  vers  r 


4i ' 


Compte  tenu  de  ce 


-  II  numdrote  les  [pQs]  que  Pc,  lui  demande  d'envoyer  (mise  en  forme  des  [Np0s].)>  transmet  ces 
[nPQs] ^  4  P  et  attend  des  j*RP  (ACC)]  .  afin  de  pouvoir  prendre  de  nouveaux [pQ  ]  ^Sef  done  transmettre 
de  nouveaux ^NPQJ  .  ;  le"-  jNPQs|^ ,  dont  les  £rPs (ACC)]  ^  ne  sont  pas  regus,  son?  retransmis  (stratSgie 


4j 


de  retransmissibnt . 

D'une  manidre  qdndrale,  la  retransmission  intervient  4  la  fin  d'une  TEMP  (METCALFE ,  7  3)  ou  quand 
regoit  des  £rP  (RET)]j  ou  des  £sEs]  ^  (nous  allons  voir  dans  l'analyse  du  niveau  3  que  P  ^  a  la 
capacity  de  ddfecter  des  erreurs  It  peut  done  signaler  ces  erreurs  4  P  ).  Notons  dgaleir.ent  que  l'on 
peut  dgalement  envisager  1?  retransmission  quand  regoit  des  [rPs(AuC)]  .  non  attendus  (c'est-4- 
dire  concernant  des  jNPQj  non  envoyds) . 

-  Il  accepte  ou  rejette  un  [npq]  regu,suivant  que  son  numdro  est  ou  n'est  pas  un  numdro  atter-du  et 
envoie  des  [Rps<ACC)]1  (le  fait  que  P  .  envoie  un  [RP  (ACC)]  ,quand  il  rejette  un  [.NPQ]^  peut  paraltre 
s  non  familidres  avec  les  stratdgiesJde  retransmission  ;  nous  expliquerons 


surprenant  aux“per  soniles 


ceci  au  paragraphe  II).  P  ,  peut  dgalement  recevoir  des  [SEs]  j  (P^.,  comme  P^,  a  la  capacity  de 
ddtecter  des  erreurs)  et  dans  cette  hypothdse,  P^  envoie  desJ  £rPsIRET)]  y 


En  ce  qui  concerne  P  il  doit  encore  assurer  une  autre  for.ction  :  le  maintien,  vis  4  vis  du  niveau  5, 
de  la  sequence  des  [PQ  ]  ,  c'est-A.-dire  P  doit  rdordonne.-  lr„  Q  acceptds  (dans  l'hypothdse  04  il  peut 
les  accepter  dans  le  dlsordre)  avant  de  pouvoir,  aprds  avo.-.r  enlevd  ?e  numdro,  transmettre  4  P  les  [pQ,.]^ 

Leo  numdros  dans  les  [npq^^  et  lesplPT,  constituent 
ies  "informations  de  service"  du  niveau  4.  ""*■  J  ] 


dans  l'ordre  oil  ceux-ci  ont  dtd  transmis  4  P,j  pas  Prl.  Leo  numdros  dans  les  [Np2g]j  et  les 


41 


1,2.2  P4o^ocoEe_da_Miueau_3 

Ce  protocole  a  pour  but  de  contrdler  si  le  contenu  des  ^NPQ  ].  et  des[RPs]  j  n'est  pas 
niveaux  infdrieurs  peuvent  altdrer  les  informations  qui  y  transitent).  Afin  de  rdaliser  ce  cc 
done  les  dchanges  suivants  au  moyen  du  niveau  2  :  des  pUJPOj  ^  de  p  4  P  .  et  de 
Compte  tenu  de  ce  contrdle,  les  fonctions  de  P  .  et  P.  .  sont  : 


Compte 
-  P 


errond  (les 
ntrole,  on  a 

Mj  de  P3j  4  P3i- 


3itP,, 

transmi 


.)  ajoute  un  bloc  de  redondance  a  chaque  [npq].  ([RP]  .)  que  P..  (p4_»)  l^i  demande  d1  envoyer  et 

met  done,  d  chaque  fois,  un  £RNPQj  ^  (  ^RRp]^)  d 


P  ,  (P-.)  effect' 

(P^.)  *  si  ce  test 

(fjPQl.i  ainsi  obtenu  d  P  ^4-j^  '•  3 
( [NPQJ  )d  P41  (P4J  mais,par  contre, 
Les  oiocs  de  r edondar.ee  fiont  les 


le  test  du  bloc  de  redondance  assoc-16  d  chaque  ^RKPj.  (  fRNPQ^] ^ )  que 
est  satlsfaisant ,  P  .  (P,.)  enldve  le  bloc  de  redondance  et  transme 


lui  transmet  P 


4  3i  4- 

si  ce  tes' 


,)Vp 

I  , 
au  3. 


L  xJi  ’  . . r  1  2i 

5  et  transmet  chaque  RPL 

(P  .)  re  transmet  aucun  ^^P]  . 
-*2  \  J 


4i 


s?  n'est  pas  satlsfaisant, 
peut  transmettre  un  £SElj  (  FseJ 
informations  de  service"  du  hivd 

.2.3  PAuLocufc  du  iu.ve.au  7 

Ce  protocols  assure  la  transmission  d' informations  sous  forme  sdrie  (bits).  Dans  co  but,  on  a  les 


(p,.->- 
1 j 


dchanges  suivants  au  moyen  du  niveau  1  :  des  [T«h  pn 
- J  _  -3-  -anlira  ovhAUQtivp  f  e  8  foHCt.1 


A  P 
ons 


et  des 
»  doivent 


a.  p  .  p  . 

User  JP?j  ef 


L'article  de 

5 

21* 


27-3 


1,2.4  P4o*oeo£e_du_ntveau_! 

Ce  pirotocole  pelmet  1 ' dcnange  de  bits  au  moyen  d'une  ligne  de  transmission.  L'ouvrage  de  f LUCK" , 6sj 
indique  les  fonctions  que  doivent  assurer  et  P^. 

1.3.  ttodlle.  c.0Mldi a?  pot ia  dljiruA.  it  p4.gie.nteA.  l<u>  iiAxUiq-iti  de  A.eJxan&miA-t><.on 

Etant  donnd  que  le  mdcanisme  de  retransmission  est  dlabord  4  partir  de  P  (niveau  4)  soit  4  la  suite 
de  situations  d'erreur  ddtectdes  par  la  TEMP  dans  soit  4  la  suite  de  situations  d'erreur  ddtectdes  dans 

le  niveau  3  et  slgnaldes  au  niveau  4,  nous  considdrons  done  le  moddle  4  3  niveaui:  reprdsentd  sur  la  figure  2  : 

-  le  niveau  5  represents  le  niveau  demandeur  du  service  pour  lequel  le  mdcanisme  de  retransmission  est 
ndeessaire, 

-  le  niveau  4  reprdsente  le  niveau  qui  implements  ce  mdcanisme, 

-  la  ftochine  de  Transmission  (MT) ,  qui  englobe  les  niveaux  infdrieurs,  reprdsente  la  machine  globale  uti- 
lisde  par  le  niveau  4  pour  la  mise  en  oeuvre  du  mdcanisme  de  retransmission. 

1.4.  lei  deux  qAanriei  ctaati  de  4tA.atgq.ie4  de.  n.e.tn.an&rrU.A&ion 

Classe  1  :  La  retransmisr.ion  rdsulte  seulement  de  la  TEMP  dans  P  .  Dans  cette  classe,  la  MT  ne  transinet  pas 
les  fSE  et  [SEsJ  au  niveau  4  et  done,  en  partieulier,  les  fRFs(RET)J.  n'exintent  pas.  En  outre,  P  ^ 
ne  tienfipas  coupte  des  [RPs(ACC)^jnon  attendus  qu'il  regoit.  *"  "*  3 

On  peut  dire  que  la  classe  1  reprSsente  la  classe  des  stratSgies  de  retransmission  avec  le  minimum 
d1 actions  causant  la  retransmission. 

Classe  2  s  Dans  cette  classe,  la  WT  trarsmet  les  [sEg]1  et  [se  ],  au  niveau  4  et  celui-ci  les  prend  en  compte 
afin  de  dSclencher  le  mScanisme  de  retransmission  comme  4  la  fin  de  la  TEMP  (la  classe  2  englobe  done  la 
ejasse  1). 


Le  but  de  la  classe  2  est  double  : 
part,  permettre,  par  rapport  4  la  classe 


d'une  part,  prendre  en  compte  1 1  intelligence  du  niveau  3  ;  d'autre 
1,  un  moilleur  dShit  d' information,  en  activant  la  retransmission  . 


,  Notons  de  plus  que,  du  fait  du  signal  Mi  ,  on  doit  en  r&gle  gdndrale  avoir  deux  types  de  £rp] 

iJrP  ( ACC )  J  j  et  £rp  (RET)]  alors  que  seul  le  premier  type  existe  dans  la  classe  1.  "* 

1 .  5 .  Hypcthiiu  de  definition  dei  itAateciiiu.  de  AeiAammiiiioH 
Nous  considdrons  que  : 

-  dans  les  conditions  normales  de  fonctionnement  (pas  d'erreurs),  envoie  un  [rP(ACC)]  ,  4  chaque  [npq]  ^ 
accepts  ;  nous  ne  considdrons  pas  ici  le  cas  04  P  recevrait  plusleurs  [NPQ^]^  et  enverralt  ensuite 

un  seul  [rp  (ACC)]  qui  acquitterait  cumulativement  Jtous  les  [NPQ  ].  (comme  dans  le  protocole  HDLC  ~~ 
raede  NRM  avec  les'Wts  P/F  (MACCHI , 79) ) ,  ' 

-  une  TEMP  est  associde,  dans  P^,  4  chaque  [npq]^  envoyd, 

-  la  durde  de 


envoie 


la  TEMP  associde  4  chaque  |NPQ  ],  envoyd,  est  telle  que,un  [rp]  ,  4  ce[NPQji  est  obtenu  par 
avant  la  fin  de  la  TEMP  ou  n'est  pas  obtenu  (nous  ne  sommes  pas  concernds  ici  par  des  milieux  de 
transmission  induisant  des  retards  qui  pourraient  faire  arriver  un  plP]  ■  aprds  la  fin  de  la  TEMP  et 
done  durant  l'cpdration  de  retransmission).  ^  ^ 


2.  LES  .STRATEGIES  PE  RETRANSMISSION 
2.1.  C.'aiii  1 

2.1.1.  Pg(]i!3iiioK_di_£Ao;yiig.tg4 


La  afin  de  ddfinir  les  diffdrentes  stratdgies,  concerne  les  deux  modalitds 

possib.  es  pour  l'envoi  de  [NPQs]  ^  par  : 

-  lujjode  1,  c'est-4-dire  P  envoie  seulement  un  [NPQ  ]^et  ensuite  attend  le  [rp (ACC)]  . , relatif  4  ce  TnpQ^ 
acln  de  pouvoir  en'oyer  Xe  [NPQ]  ^  suivant,  ou  la  fin  de  la  TEMP  associde  4  ce  [NPQ]^pour  le  renvoyer, 

-  le_™°de_2,  c'est-4-dire  p  envoie  plusleurs  [NPQ  ],  (nous  considdrons  (q+1)  o4  q  est  un  entier  positif 
dont  la  valeur  est  une  contrainte  dans  une  impldmentatior.)  et  ensuite  attend  "des  [rp  (ACC)  1  ou  la  fin 
de  la  TEMP  associde  au  premier  des  (q+1)  [NPQj  "  (cette  partie  de  phrase  entre  quillemets  est  floue  , 
c'est-4-dire  :  quel  est  l'ordre  d'arrivde  des^RP  (ACC)]  ,  pour  qu'ils  soient  pris  en  compte  et  quelle 
action  est  entreprise  quand  un  [RP(ACr))  a  dtd  pris  en  Compte  7  Quelle  action  est  entreprise  4  la  fin 
de  la  TEMP  indiqude  7  Nous  ldverons  le  fxou  de  cette  phrase  dans  la  suite  de  ce  paragraphe  en  prdcisant 
les  comporttments  possibles  de  P  ,)  i  notons  que  nous  supposons  qu'aucun  [ftp],  ne  peut  etre  regu  par 
P4^  avant  la  fin  de  l'envoi  du  (q+l)fcmt[NPQj^. 

En  utilisant  la  notion  de  [npq]^  en  transit  (  [npq]  ,  envoyd  par  P^  mais  dont  le  [rP(ACC)]  .  n'a  pas 
encore  5td  oLtenu  par  P  ),  on  peut  encore  dire  pour  distlnguer  les  modes  1  et  2  :  dans  le  mode  Jl,  il  y  a  un 
[npq] i  en  transit  ;  dans4le  mode  2,  il  y  a  (q+1)  [NPQs]^  en  transit. 

Les  caractdristicpies_du  mode  l_sont  :  P^  ndeessairement  accepte  les  [nPQ  en  sdquence  j  apres 
avoir  accept!  un  Jnpq]^,P4^  envoie  immidiatemei. ;  lc  ^RP(ACC)]^  relatif  4  ce  £NPS^- 


^_^SH5ii?®_£°hsi3f£ation  concerne  dans  l'hypothdse  du  mode  2,  les  deux  politiques  possibles  d'acccp- 
tation  de  [NpQg]4  P®t  p4j  : 

_  c'®st-4-dire  P^^  accepte  les  (q+1)  seulement  en  sdquence  (comme  dans  le  mode  1), 

-  la  pqlitique_2 ,  c'est-4-dire  P^  accepte  les  (q+1)  pJPQs]  dans  n'importe  quel  ordre. 


27-4 


La  politique  1  a  les  consequences  imm^diates  si'ivantes  : 


-  si  le  premier  des  (q+1)  [NPQs]  ^  n'est  pas  obtenu  par  P^,  1'4j  normalement  rejettera  Jes  q JnPqJ ^uivants , 

-  done,  compte  tenu  de  ce  compor tement  de  P  . ,  on  peut  maintenant  pr4ciser  celui  do  P  (cf  phrasefioue)  : 

P  ,  attend  seulement  le  [RP(ACC)]  relatif ]su  premier  des  (q+l)[NPQs]  et,  quand  il  I’obtient,  P  ^  peut 
alors  envoyer  un  nouveau  [NPQ]  ;^4  la  fin  de  la  TEMP  relativeau  premier  des  (q+1)  [NPQg]^,  auto- 

matiquament  retransmet  les  (q+1) [NPQj  .,  e'est-a-dire  nous  avons  ce  que  nous  appelons  une  retransmission 
globale  ("go  back"  procedure  (BURTON, 72  ;  NERI,77))j  la  retransmission  globale  permet  de  contrdler  les 

situations  dans  P..  consecutive  4  la  non-obtention  du  premier  des  [npq  ]  ..  Notonn  que  l'etat  d'attente 
dans  P  est  identique  que  l'on  soit  avant  ou  apres  la  retransmission  globale. 


La  politigue  2  a  une _conseque net  immediate  :  la  necessite  pour  p 
adsordre. 


' '  reordonner  les 


[nPQs]  ^  acceptes  dans  le 


Latroisieme  consideration  concerne,  dans  l'hypothese  du  mode  2  et  de  la  politique  2,  les  deux  techniques 
possibles,  pour  P  ,  d'accuser  reception  des  [NPQ  1  .  acceptes  :  nous  distinguons  ce  que"  nous  appelons  1  i  ACC 
collectif  et  le  Aii  individuel. 

Avec  la  technique  du  ACC  collectif,  P  envoie  un  [rP(ACC)]  ,  A  un  [npq]^  accepte  seulement  si  tous 
les  [NPQ  ] .  dont  les  numeros  sont  inferieuro  a  celui-ci  ont  ete  acceptes.  Done,  avec  cette  technique,  P 
peut  accuser  reception  d'unfNPC?],  accepte, de  plusieurs  fagons  :  nous  disons,  relativement  4  un  [NPQj.  j 

qu'un  [rp (ACC)]  est  un  ACC  d'orare  0  ou  dlordre  1  ou  .  suivant  qu'il  accuse  reception  de  cs  [NPQJ^ 

et  aucun  [NPQJ^avec  un  numero  superieur  ou  qu’ii  accuse  reception  de  ce  [NPQ] ^  et  du  [NPQ^  avec  le  numero 
immediatement  supdrieur  tv  . 

Avec  la  technique  du  ACC  individuel,  P  accuse  reception  de  chaque  FNPQ+  accepts,  4  1' instant  d' accep¬ 
tation  (comme  dans  le  mode  1  et  le  mode  2  avecla  politique  1, .  L  J 


La  ^ technique  du_ACC  collectif  a  les  consequences  immddiates  sui”antes  : 

-  suivant  que,  P  obtiendra  le  premier  des  (q+1)  [NP2g]  4  ou  ne  l'obtiendra  pas,  P 
d'ordre  u)] .  relatif  au  premier  de  ces  (q+1)  [NPQJ^  ou  n'enverra  ancun[RPj,  meme  s' 

[NPQJ  ^  panii  les  q  suivants,  H 

-  compte  tenu  de  la  maniere  de  rdpondre  de  P 


enverra  le  [rp(ACC 
il  accepte  des 


4 .,  on  peut  voir  que  le  comportement  de  P^^  (cf phrase  f)oue) 
doit  etre  identique  4  celui  indiqu4  dans  leJcas  du  mode  1  avec  la  politique  1  (remarquons  :  que  P  , 
ait  ou  n'ait  pas  accepte  des  [ NPQ  s] ^  parmi  les  q [NPQ^]  .  suivant  le  premier,  P  ne  le  salt  pas  et  done 
la  retransmission  globale  est  le  seul  moyen  de  controller  toutes  les  situations  possibles  en  p4j)  , 


g+coaie  lartrerence  avec  id  cat;  uu  muue  i  ec  ue  id 
n'importe  quel  [RP(ACC  d'ordre  0)]  ,  relatif  a  n'imp 
en  parlant  du  premier  des  (q+1)  [NPQ  1  ,  P  at.ten 

_  _ _ J  i _ 1  ..  .4 - - - +  _ 


l'Stat  d'atter.to  dans  apres  )a  retransmission  globale  est  different  de  celui  avant  la  retransmission 
globale  (difference  avec  la  cas  du  mode  1  et  de  la  politique  1)  c 'est-a-dire  maincanant  .  attend 

1 importe  quel  des  (q+1)  [NPQ  ].  retransmis  (cu  encore 
l-.tenc  n'importe  que.d  ACC  all2nt  de  l'ordre  0  a  l'ordre  q)  : 
on  effet,  quand  les  [NPQ  ] .  re tran  amis'' arr  i't n L  on  P  . ,  et,  si  lc  premier  est  maintenant  accepts,  P 
pout,compte  tenu  des  different;,.  possibi1  itdr.  de  [NPy-1].  acceptds  FrecSdemmont  A  la  retransmission 
globale, envoyer  maintenant  tout  l'Sventail  possible  des  [rp  (ACC)].  ;  notons  encore  que,  compte  tenu 
de  la  sSmar.tique  du  premier  [RP(Acr)]  .  obtenu  apres  la  retransmission  globale,  peut  alors  envoyer 

plusieurs  nouveuux  [NPQS] ■  (en  et f et ,  (q+1 )  [NPQg]  ^  peuvent  etre  mis  en  transit). 

La  teennigue  du  ACC  Individuel  a  les  consequences  immediates  suivartes  :  meme  si  P . .  n'obtient  pas  le  premier 
des  (q+If[NPQ  ]  .,  P^,  .  enverra  un  [RpIaC'C)]  relatif  a  chaque  [npqJ  accepte  parmi  16s  q  suivants  ;  done 
dcit  prendre  entiomptle  n'importe  quel  [jlP(ACC)].  relatif  eux  (q+ 1 )  [NPQ  .]  ^  .  On  peut  maintenant  prdciser  le 
comportement  de  1  (cf  phrase  Iloue)  .  Jusqu'S  li  fin  de  la  TEMP  associle  au  premier  des  (q+1'  [npqJ^,  p44 
attend  : 

a)  le  [rp  (ACC)~j  ,  a  ce  premier  des  (q+’.'  P NPQ s~j ^ ;  quand  il  est  obtenu,  P  peut  alors  envoyer  un  nouveau 

[np<^,  l 

b)  ygalement,  des  [rp  (ACC)]  .  relatifjaux  q^NPQ  ].  suivants  (la  possibility  de  cette  obtention  depend 
'^videmment  de  la  dur£e  de^la  TEMT  conriuerye?  t  ces  fRPg  (ACC)  1  sont  enregistres  etles  envois  de  nouveaux 

{NPQ  ]  ,  fconsycutifs  &  ces  obteritions  de  [rp  (ACC)]  , ,  seronc  envisages  seulement  apr£s  la  fin  de  la  TEMP 
consiayrye.  A  la  fin  de  la  TEMP  associle  au  premier  des  (q+1)  T NPQ  1  ,  on  retransmet  uniquement  ce 
[NPQ] i  (retiansr.ission  selective  (METZNEP,77  ;  EASTON ,79)^. 

La  quatrieme  consid(5ratxon  concerne  clans  l'hyputhese  du  mode  2  avec  la  politique  2  et  la  technique  du  ACC 
individuel#  les  deux  possibles  comportements  de  ?..  apres  la  retransmission  du  premier  des  (q+1) [NPQj  ^  : 
comportement  1  ou  comportement  2  suivant  que  n^envoie  pas  ou  envoie  un  nouveau  [NPQ]^  pour  chaque 
[RP(ACcj*J~  obtenu  relativement  aux  q[*NPQ  suivants  ;  dans  le  cas  du  comportement  2  et  durant  la  retransmission 
considytee#  on  peut  avoir  q  nouveaux  envoyJs  (en  effet  (q+1)  [NPQS^  peu\  _nt  etre  en  transit). 

Cependant,  concernant  le  comportement  2,  on  peut  envi sager  le  cas  oCi  P^  retransmet  tou jours  le  premier 
[npq] .  considyry  et  envoie  de  nombieux  nouveaux  [npq  ] . .  Ce  comportement  doit  avoir  des  limites  parce  qne,en 
partrculier,  1' ensemble  des  numyros  utilisys  pour  la  numyrotation  des  n'est  pas  inf ini.  Nous  appelons 

y  le  nombre  maximumifNPQ  *]  qui  peuvent  etre  envoyes  tant  que  le  premier  des  (q.+l)  fNPQ  ].  est  toujours  en 
transit  (y>q).  La  valeur  de  y  est  une  autre  contrain\e  (aprds  la  valeur  de  q)  d'une  implementation. 


Ces_^uatre  consi gyrations  nous  amenent  A  dyfinir  les  cinq  stratygies  nuivantes  :  la  stratygie  I  qui 
concerne  le  mode  i  ;  la  stratigie  II  qui  est  relative  at  mode  2  avec  la  politique  1  ?  la  stratygie  III  qui 
est  relative  au  mode  2  avec  la  politique  1  et  l'ACC  collectif  ;  les  stratygies  IV  et  V  '"•ui  concernent  le  mode 
2,  la  politique  1,  le  ACC  individuel  et  respectivement  le  comportement  1  et  le  comportement  2. 


27-5 


2. 1.2.  G6ni^at^^i  iM_cu_i^iatig^u_dc  ^eX>iar^miiiwn 

2. 1.2.1.  Lu  jeiAouAceA jritczi&auuii _dani  T^_et  P^j 

Le  nombre  de  m&noires  tampons  et  de  teraporisations  dans  est  fixfi  par  le  nombre  maximum  de 
en  transit  :  1  tampon  et  1  tempo risati on  dans  la  stratdgie  I f  (q+1)  tampons  et  (q+1)  tempo risations 
dans  les  strategies  n,  ill,  IV  et  V. 

dans  P  .  est  fixe  par  le  nombre  maximum  de  £npq  ] 

]i  4  P^j  :  1  tampon  danB  les  strategies  1  et  IX; 

Sans  la  Stratygie  V. 

2.  1.2.2.  Etati  ^ondamntauK  du  tnan&  hn>U  du  [NP$J  . 

Ce  sont  les  etats  d'attente,  dans  P  et  P  ,  qui  sont  caract6ris6s  par  des  "informations  de  contexts": 
la  FenStre  F  dans  P  et  la  FenStre  F  dans  P  .  (One  fenStre  est  un  sous-ensemblo  de  la  sequence  des  entiers 
naturels  :  le  plus  pe4it  nombre  et  le  plus  gran*  nombre  sont  respectivement  appelSs  le  coin  gauche  et  le  coin 
droit)  . 


iqtn 


peut  acceptor 
ns  dans  les 


Le  nombre  de  memoires  tampons 
avant  de  pouvoir  transmettre  des  [pq 
strategies  1T.I  et  IV/  (Y+l)  tampons 


La  fenStre  comprend  les  numSros  des  r-oji  qui  ont  envoy^s  et  dont  les  [RPg (ACC)J  ^  sont  attendu 

La  fenetre  F.  comprend  les  num^ros  des  Pnpq  1.  qui  sont  attendus  et  seront  done  accepts  s'ils  sont 
requs.  Is* 

Les  fenetres  F  et  F,  traduisent  done  l'£tat  des  ressources  utilis^es  dans  Pa,  et  P,.  pour  le  transfert 
des  [WQJ1.  1  *  41  4j 

2. 1.2.3.  PAog  Action  du_  tAan* £c.A,tjite i 

Cette  progression  est  oaract^ris^e  par  Involution  des  yt-.ats  d'attente  dans  p  et  P  ,  c ' est- A-dire , 
plus  pr6cis£raent,  par  des  changements  dans  les  fen£tres  F^  et  F^ .  J 

Ces  changements  dependent  A  la  fois  des  resultats  du  transfert  de?  fNPQg]^  raais  egalement  des  relations 
du  niveau  4  avec  lc  niveau  5  (P  occupe  ses  ressources  avec  les  £pq  *]  venant  de  P  et  P  .  libdxe  ses  res¬ 
sources  en  transmettant  des  a  P^).  S  1 

II  est  absolument  essentiel  que  les  changements  dans  les  fenetres  F^  et  F^  soient  synchronises. 

Une_r^gle  fondamentale  de  cette  synchronisation  est  :  quand  envoie  mi  [fP(ACC)].,  sa  fenetre  F.  doit  £tre 

positional  de  maniSre  a  ce  qu'il  puisse  accepter  Ie(s)  ["NPQ  .1,  que  P  peut  lui  envoyer  quand  if  revolt 

ce  [RP(ACC)]  y  4 

2. 1.2. 4.  Repose  enyoi/&£  go*  P^yjjuowd Ac.jcttc  Ae.£u_ 

P4j  re^ette  un[NPC]i  regu^chaque  fois  que  le  num^ro  de  ce^NPQj.  est  incompatible  avec  la  fenetre  F ^ . 
Cette  incompatibility  a  une  des  causes  suivantes  : 

1  -  Ce  [npq] .  est  un  [nPq] .  prycydemment  accept^  par  p  mais  le  fRP(ACC)!  . ( envoys  par  P  ;n'a  pas  yty  obtonu 

par  P4i  parce  que  :  1  4:  '  3  ^ 

a)  ou  il  a  6t6  perdu  dans  les  niveaux  inf^rieurs, 

b)  ou,  son  contenUj  ayant  yty  perturb^  par  1*  transmission  dans  les  niveaux  infyrieurs,  les  erreurs  con- 
sdquentes  n'ont  pas  yty  dy teethes  par  le  test  du  bloc  de  redondance  dans  P  .  et  affectent  sa  syman- 
tique . 

2  -  ce  [npq]^  est  un  [nPQ^  envoyy  pour  la  premiere  fois  par  P^  mais  : 

a)  ou,son  contenu , ayant  ete  pex turbypar  sa  transmission  dans  les  niveaux  infyrieuis,  erreurs  con- 
sequentes  n'ont  pas  ytu  Uetectyes  par  le  test  du  bloc  de  redondance  dans  P  ^  et  affectent  son  numyro 
de  maniAre  a  ce  qu'il  ne  corresponde  plus  A  un  numyro  attendu, 

b)  ou  (soulement  dans  la  stratygie  II),  ce  fNP^L  est  n'importe  quel  des  q  (npq  .  suivant  premier 

des  (q+1)  lorsque  ce  dernier  a  yte  per turbecomme  indiquy  en  2.a),  S 

c)  ou  (encore  seufement  dans  la  stratygie  II)  ce  CnpQJj  es^  n'importe  quel  des  q  suivant  le  pre 

mier  des  (q+D^NPQ^^  lorsque  ce  dernier  a  yty  perdu  dans  les  niveaux  infyrieurs.  s 

Quand  a  rejelte  u*i(npq}^  re^u,  P^^  envoie  comme  s 

-  dans  lo3  strategies  I,  II,  IV  et  V,  ie  [k?'  (ACC)]  relatif  a  ce  [npqI  .  , 

-  dans  la  sttateqiem,  ie  derrier  [^KP(ACC)J^  e:\vdye.  *  1 

Cette  inonidrc  de  rypondre  permet  de  contrdler  les  situations  d' erreurs  indiquyes  ci-dessus  : 

-  si  la  ou  l.b,  P^  obtiendra  un  [^rp  (ACC)]  attendu, 

-  si  2. a  ou  2.b  ou  2.c,  obtiendra  un  [RP(ACC)]  non  attendu,  ne  le  prend  done  pas  en  compte  et  done 
rctransmettra  ie  £npqJ^  concerny  ^  la  fin  de  la  TEMP  qui  lui  est  associye. 

No tons  que,  si  on  considere  qu'il  n'y  a  pas  d’erreurgnon  dytectyes  par  les  testf des  blocs  de  redoncanCe 
dans  le  niveau  3,  les  situations  l.b,  2. a  et  1’ ,b  n* existent  pas  (yvidemment  la  validity  de  cette  hypoth^se 
dypend  des  performances  de  ces  tests  j  cette  hypoth^se  est  la  plus  souvent  considyrye  par  les  personnes  trai- 
tant  des  protocoles  de  communication  mais,en  toute  rigueur,  et  en  parti culier,  dans  une  ytude  de  fiability, 
il  faut  considyrer  ces  situations) . 

2.  1.2.  5.  Codage  dc*  jRP^jACCj]  . 

Les  [RPs ( ACC) J  ,  sont  reprysentys  avec  lus  numyro s  des  |NPQ  Plus  prycisymf»nt,  le  numyro  d'unTNPQl 
reprysente  :  ^  S 

tratygies  I,  II,  IV  et  V, 
dans  la  stratygie  til. 


-  le  ACC  a  ce  [npq] .  dans  les  sir 

-  le  ACC  d'ordre  0  1  a  ce  [NPyJ^da 


2. 1,2.6.  NamoAt-CtU-t  ._>k \  • 

Kllc  doit  etre  cboisic  de  mani£ie  <1  ce  que  puisse  d4tecter  l*arriv6e  de  £npqJ  ^  dt^jA 
acceptes.  Cos  nouvelles  arriv6es  sont  constfcutives  A  la  non-obtention,  par  P  des  TbP  (ACC)J  envoy4s. 

Afin  de  determiner  la  numerotation,  considerons  la  situation  suivonte  :  S  ^ 

a)  supposons  tout  d'abord  que  le  num4ro  du  premier JnpqJ^  envoys  est  le  n*un4ro  0  et  que  P^,  a  accepts  ce 
[NPQ j ^  mais  6galement  tous  les  [npQ^] ^  que  P4i  peut  envoyor  sans  recevoir  le  [rP (ACC)J  relatif  A  ce 
[NPQJ^  dc  num4ro  0  ;  nous  representons  ei-dcssous  la  fenetre  F,  caract4ristique  de  cette  situation 

{[a,  b  . . .  .J  signifie  que  dn_  tampons  sent  disponibles  pour  recevoir  les  ^NPQ^j^  Ue  num^ro  a,  b  ....)  : 

Strat4gie  I  «  [  l] 

Strategic  II  F_.  -  ^q+lj 

Strat4gie  III  et  IV  F^  =  £q+l,  q+2,  2q+lJ 

Strat4gie  V  F_.  **  £y+l,  y+2,  ....  y+q+lj 

b)  supposons  de  plus  que  le  £rP(ACC)J  .  relatit'  au  de  nUm4ro  0  n'est  toujours  pas  regu  par  qui 

done  le  retransmet  toujours.  ^ 

Ce  £npq]^  de  num4ro  0,  en  supposant  que  son  num4ro  est  regu  correctement  pax  P  . ,  ne  doit  pas  Atre 
confondu  avec  les  num4ros  des  fNPQ  ^  ,  que  P  ,  peut  accepter.  En  consequence,  la  numeroration  doit  5tre,au 


Strat6gie  I  :  modulo  2  ;  strategic  II  :  modulo  (q+  2)  ;  strategies  II]  et  IV  :  modulo  (?q+*2)  ; 
strategic  v  :  modulo  (y+q+2) . 

2.2  Claaae  2 

Compte  tenu  de  la  definition  de  cette  classe,  nous  obtenons  imm4diatement ,  la  strategical,  (pour  le 
mode  1)  et  les  strategies!!  ,  III  ,  IV.  et_V  (pour  le  mode  2)  dont  les  caract4ristiques  nouvelles  par 
rapport  respectivement  aux  strategies  I,  II,  ill,  IV  et  V  sont  : 

a)  quand  P  ,  regoit  un  £se1 . ,  P  .  envoie  un  [RP(RET)J  .  qui  comprend  un  num«5ro  repr4sentant  le  num£ro  du 


— -  4jr  ‘  v. — j  j  '  * 

pro  chain  [_NPQ  j  ^  attendu  en  sequence, 


b)  quand  P4_  regoit  un  [RP(RET)J  .  :  d.-ns  la  strat4gie  1^,  P^ .  entreprend  la  retransmission  du  Cnpq]^  en 

attente  ^'accuse  do  reception-5;  d*”  s  les  strategies  II.  et  III  ,  P  entreprend  la  retransmission  global^, 
dans  les  strategies  IV^  et  V^,  F  rotru.  met  soulement  le  £npQ],  ctont  le  num4ro  est  inclus  dans 
[hp(RET)J  ^  (ce  numero  est  lc  nume*  >  du  premier  e:i  attente  i'un  accuse  de  reception), 

c)  quand  regoit  un  il  egit  de  manic  re  identique  au  cas  b. 

pans  ces  strategies,  conune  on  a  deux  types  dc  £rpJ  .  (  le  cardinal  do  1' ensemble  de.*  j 

est  done  deux  loin  plus  elove  que  celui  relatif  aux  strategies  ddiinies  au  paragraphe  2.1. 

Cependant ,  en  ce  qui  conoerne  le  mode  1,  on  peut  detinir  deux  nouvelles  strategies  (strategies  I0  et  1^) 
dont  le  cardinal  de  l'tmsemble  des  identique  a  celui  de  la  strat4gie  I  : 

-  strategic  l.}  :  elle  utilise  les  mSmes  [rP^  ]  .  quo  la  strategic  1^  (le  type  £rP  (ACC)].de  dimension  2). 

Ses  caractcristiques  sont  :  ’  J  ^ 

a)  quand  .  regoit  un  [seJ  . ,  il  onvoie  un  £  RP  (ACC)J  .  comnie  s'il  rejettait  un  [npq]^, 

b)  quand  P  ^  regoit  un  [RP(icx:^J  .  non  attendu,  il  rerlvoie  le  [nPQ^  .  en  attente  d’accu.L-4  de  reception 
(dans  laJstrat4gie  I  un  [rT  (Aoc)1  non  attendu  a  l.a  semantique  i‘une  demande  de  retransmission), 

e)  quand  regoit  un  £seJ^,  il  agii  dc  manidre  iuentique  au  cas  b. 

«  stratecjie  1  ;  e'est  une  strategic  qui  peut  et.re  d4finie  en  supposant  que  la  redondance  utilisde  dans 

le  niveau  3'  detecte  toutes  les  erreurs  affect.ai.t  le  contenu  des  ^NPQ  .  et  des  £rp  J  ,  :  dans  cette 

hypo these,  les  niveaux  inf^rieurs  au  niveau  4  perdent  seulement  des  informations  mail  rie  laissent  pas 
passer  des  informations  erron6es  (en  particulier,  on  peut  dire  que,  lorsque  P  regoit  un  £npq1 ,  dont 
le  num6ro  est  incompatible  avec  la  fenAtre  F.,  ce  ^NPqJ .  est  un  pr4c4demment  accepts  mats  dont 

le  [rP  (ACC)J  a  4t4  perdu)  .  3  1  1 

La  strategic  utilise  deux  types  de  £rp] .»  chaque  type  4tant  de  dimension  15  on  a  un  £rp (ACC)J  .  qui 
est  utilise  pour  accuser  reception  de  n'importe  qiel  [np^] .  ;  on  a  un  £rp  (RET)J  .  qui  est  utilise  pour  d^mander 
la  retransmission  de  n'i.mporte  quel  [NPQ]i  Lo.s  caract4risiiques  de  fonctionnem^nt  sont  : 

a)  quand  regoit  un  ^SEj  ,  il  onvoie  le  £rp  (RET)J  , 

b)  quand  regoit  le  [rP(RET)J^,  i.1  renvoie  done  le  [npqJ^  en  attente  d'accus4  de  reception, 

c)  quand  P^^  regoit  un  [seJ^,  il  agiL  de  inaniere  identique  au  cas  b. 

2.3.  Re  ciipbttcCttti u  v, 

Le  tableau  rdcapitulatif  de  l”  figure  3  donne  les  diff4rentes  ntrat4gies  avec,  d'une  part,  les  Elements 
necessaires  a  leur  definition  et,  d  tre  part,  leurs  propri£t4s. 

CONCLUSION 

La  presentation  effectu4e  dans  ce  p  ,pier  fait  apparaltre,  a  notre  avis,  deux  grands  points  d'int4r§t. 


Tout  d'abord,  1 1  utilisation  d'un  modele  hierarchise  a  trois  niveaux  (le  niveau  demandeur  du  service 
pour  iequel  les  strategies  de  retransmission  sont  impl4ment£es  ;  le  niveau  qui  impl4mente  ces  strategies  ; 
la  machine  de  transmission  utilisee  par  ie  niveau  precedent  afin  de  mettre  en  oeuvre  ces  strategies)  a  permis 
d'effectuer,  a  notre  -Oi...aissanct*  pour  l.i.  preniiere  fois,  une  presentation  precise  et  exhaustive  des  stra- 
t4gins  de  retransmission.  Kniin,  cette  presentation  de:.  diff<5rentes  strategies  est  une  4tape  essenciclle 
avant  Jt  passer  <i  dt ux  etapes  ulter ieures  qui  sont  1 ‘ implementat ion  de  ces  strategies  et  la  nod41i nation  for- 

rr~  . -  1  *  non  f  t.-At  imi  ioaiuUe. 


27-7 


BIBLIOGRAPHIC 
BARTLETT,  1969 

BURTON, 1972 
EASTON, 1979 

GRAY,  1972 
LUCKY, 1968 
LYNCH, 1968 

MACCHI , 1979 

METCALFE, 1973 
METZNER.1977 

NERI , 1977 


"A  note  on  reliable  full-duplex  transmission  over  half  duplex  links",  Commun.  Asa.  Conput. 
Mach,  vol.  12,  May  1969. 

"Errors  and  error  control",  Proc.  IEEE,  vol.  60,  n”ll,  November  1972. 

"Design  choices  for  selective-repeat  retransmission  protocols",  Research  Report, IBM  Thomas 
J.  Watson  Research  Center,  Yorktown  Heightu,  New  York. 

"Line  control  procedures",  Proc.  IEEE,  vol.  60,  n*ll,  November  1972. 

"Principles  of  data  communication",  New  York,  Me  Graw-Hill,  Book  Conpany  Inc.,  1968. 

"Reliable  full  duplex  file  transmission  over  half  duplex  telephone  lines",  Common  ACM  11, 

6  June  1968. 

"Tel6informatique  :  transport  et  traitement  de  1' information  dans  les  reseaux  et  Bystemes 
teleinformatiques" ,  Dunod,  1979. 

"Packet  communication",  MAC  TR-144,  Massachusetts  Institute  of  Technology,  December  1973. 

"A  study  of  an  efficient  retransmission  strategy  for  data  links",  NTC  77  Conf.  Rec.  , 
pp.  3B  :  1-1  to  3B  :  1-5. 

"A  reliable  control  protocol  for  high-speed  packet  transmission",  IEEE  Trans,  on  Communica¬ 
tion,  October  1977. 


c 


1 


-  -  liaison  virtually 

- — —  -  liaiaon  rSella  Cj 


niveau  5 


niveau  4 


niveau  3 


niveau  2 


niveau  1 


niveau  0 


Signaux  electriques 


TiquAi  I  Madlll-  hiS AaAckiil  du  2.i litlmt  cuniidint 


28-1 


PRACTICAL  ASPECTS  WHICH  APPLY  TO  MIL-STD-1553B  DATA  NETWORKS 

by 

I.  Moir 

Military  Systems  Engineer 

Smiths  Industries  Aerospace  &  Defence  Systems  Company 
Cheltenham  Division,  Bishops  Cleeve,  Cheltenham 
Gloucestershire,  GL52  4SF.  England, 
and 

Mr.  P.A.  Duke 

Senior  Avionics  Systems  Engineer 
British  Aerospace,  Brough 
North  Humberside,  UK 


SUMMARY 

This  paper  discusses  practical  aspects  which  apply  when  attempting  to  design  a 
complex  avionics  system  based  on  a  Data  Bus  Architecture.  An  example  of  such  a  system  is 
the  Stores  Management  and  Weapon  Aiming  system,  and  this  is  discussed  in  detail. 


1.  INTRODUCTION 

1.1  The  Arrival  of  the  Data  Bus 

The  data  bus  offers  many  potential  advantages  over  hardwired  or  dedicated  data 
transmission  systems  in  the  design  of  Avionic  Systems.  Systems  are  interconnected  by  a 
single  or  redundant  twisted  pair  of  wires  via  standard  interfaces,  so  reducing  inter 
system  wiring  and  the  types  and  numbers  of  interfaces.  The  quantity  of  data  transferred 
no  longer  has  a  direct  influence  on  the  inter  system  wiring  and  distributed  computing 
becomes  feasible.  However  in  spite  of  the  obvious  advantages  of  a  data  bus  system  there 
are  certain  limitations  which  could  be  the  ^source  of  much  heartache  to  the  system  designer. 
Problems  may  result  from  transmission  dela  s,  digital  sampling  noise  and  the  fundamental 
upper  limit  on  data  item-data  rate  product.?  Also,  since  interconnection  between  systems 
is  via  a  common  path,  faults  in  the  communication  medium  can  have  serious  consequences, 
and  therefore  the  use  of  redundancy  and  error  correction  techniques  need  to  be  employed. 

1.2  Designing  a  New  System 

When  starting  to  design  a  new  avionic  system  based  on  the  Data  Bus  principle  a  number 
of  system  configurations  can  be  devised.  The  functional  areas  can  be  allocated  to  hardwire 
units  and  the  interface  signals  rationalised.  However,  the  resulting  system  could  present 
a  high  technical  risk  unless  practical  experience  has  been  gained  with  the  system,  and  the 
inevitable  design  problems  identified. 

In  order  to  investigate  th>»  limitations  of  data  bus  systems,  to  gain  practical 
experience  of  distributed  computing  centres  interconnected  by  data  buses  (in  advance  of  a 
new  aircraft  project)  and  to  stimulate  manufacture  of  bus  compatible  equipments,  an 
Avionic  Systems  Rig  facility  has  been  established  in  the  UK,  at  British  Aerospace,  Brough. 

2.  THE  AVIONIC  SYSTEMS  RIG 

2.1  Hisfory 

The  need  for  an  Avionics  Systems  Rig  became  apparent  during  the  1970s,  the  intention 
being  that  the  Rig  would  provide  a  tangible  and  cost  effective  means  of  risk  reduction  in 
the  development  of  future  aircraft  avionic  systems.  The  purpose  is  to  demonstrate,  in 
ground  rig  form,  total  system  integration  and  system  architectural  design  concepts,  making 
use  of  the  considerable  technological  progress  which  has  been  achieved  in  recent  years, 
and  in  particular  of  the  data  bus. 

2.2  Objectives  and  Aims 

The  next  combat  aircraft  project  was  expected  to  appear  during  1987  to  1990.  Current 
uncertainties  about  timing  and  form  or  such  an  aircraft  make  a  rig  programme  wit  lin  the 
spirit  of  that  originally  conceived  even  more  valid  to  exploit  and  practically  ur.ry 
developing  technology. 

The  general  objectives  for  the  rig  are: 

(a)  To  provide  a  focal  point  for  the  design,  development  and  practical  demonstration  of 
a  fully  integrated  total  system  using  a  multi  bus  architecture  with  sub  system 
integration,  asynchronous  data  transfers  and  total  system  executive  control. 

(b)  To  support  the  design  and  development  of  systems  for  a  fixed  wing  tacticaJ  combat 
aircraft  to  enter  service  towards  the  end  of  the  present  decade. 

!c)  To  provide  a  stimulus  for  the  production  of  equipment  compatible  with  DEF  Stan.  00-18 
(Fart  2)  (i.e.  MIL-STO-1553B) . 


28-2 


(d)  To  investigate  the  specification,  procurement  and  management  procedures  for  a 
highly  integrated  system. 

(e)  To  effect  an  improvement  in  total  system  fault  diagnosis. 

{£)  To  develop  a  capability  for  the  control  and  management  of  software  procurement  and 
adherence  to  quality  assurance  standards. 

(g)  To  develop  systems  which  will  be  properly  matched  to  the  pilot's  requirements  and 
capabilities. 

2.3  Avionics  Industry  Participation 

It  was  recognised  from  the  outset  that  the  UK  avionics  induscry  should  be  closely 
involved  with  the  project.  This  has  been  achieved  during  the  planning  process  through 
consultation  with  UK  avionics  companies.  A  working  group  comprising  senior  engineers 
from  a  number  of  these  companies  has  been  set  up: 

(a)  To  ensure  that  the  Rig  reflects  current  developments  in  avionic  systems  technology. 

(b)  To  help  in  the  communication  of  results  and  experience  from  the  Rig  program  to  the 
Avionics  Industry. 

(c)  To  provide  a  forum  for  the  discussion  of  Rig  procurement  difficulties. 

(d)  To  provide  a  forum  for  the  discussion  of  standards  applicable  to  the  rig. 

During  the  early  stages  of  the  programme  the  working  group  has  assisted  in 
establishing  an  overall  system  architecture,  and  producing  outline  specifications  for 
its  sub  systems. 

2.4  The  Architecture  of  a  Multi  Bus  Avionic  System 

The  overall  systems  architecture  was  derived  in  the  light  of  studies  carried  out  by 
the  UK  aircraft  and  avionics  industry  over  a  number  of  years.  One  of  the  studies  was  to 
develop  a  systems  architecture  for  an  offensive  support  aircraft.  This  was  carried  out 
using  a  'top  down'  approach  to  system  design  which  led  to  functional  grouping  of 
equipments.  These  functional  groupings  were  found  to  give  advantages  in  the  comprehension 
of  system  operation,  the  specification  of  system  performance,  and  in  equipment  procurement 
and  management.  The  functional  groups  derived  were: 

(a)  Aircraft  Group 

(b)  Pilot  Group 

(c)  Navigation  Group 

(d)  Mission  Group 

The  Aircraft  Group  of  sub  systems  comprise  those  sub  systems  which  are  primarily 
concerned  with  keeping  the  aircraft  flying  safely  i.e.  they  are  safety  critical,  and 
contain  the  Flight  Control  and  General  Aircraft  systems. 

The  Pilot  Group  contains  systems  and  functions  which  interface  directly  with  the 
pilot,  such  as  the  cockpit  controls  and  displays  together  with  those  such  as  the  avionics 
bus  controller  which  provide  a  total  system  control  function. 

The  Navigation  Group  contains  systems  and  functions  which  determine  the  position  of 
the  aircraft  and  where  it  is  to  go. 

The  Mission  Group  embraces  all  those  functions  that  are  concerned  with  attack, 
defence  and  stores  management. 

The  systems  within  these  functional  groupings  will  communicate  over  the  avionics  bus 
as  shown  in  Figure  1. 

Whilst  communication  between  groups  takes  place  over  the  Avionics  Bus  it  was  found 
that  for  geographically  distributed  sub  systems  and  units  within  groups  additional  dftta 
buses  within  the  group  were  required.  A  particular  instance  which  will  be  considered  in 
more  detail  later  in  this  paper  is  the  Stores  Management  function  within  the  attack  area 
of  the  mission  group. 

The  architectural  configuration  was  driven  mainly  by  availability  and  safety 
requirements.  One  requirement  which  was  a  strong  driver  was  that  wherever  possible  nc 
single  failure  should  cause  a  mission  abort.  Another  was  that  single  or  combined  failures 
should  have  a  very  low  probability  of  hazarding  the  aircraft  or  friendly  personnel  on  the 
ground.  Groups  of  systems  which  have  this  safety  requirement  are  shown  at  the  bottom  of 
Figure  1. 


28-1 


PRACTICAL  ASPECTS  WHICH  APPLY  TO  MIL-STD-1553B  DATA  NETWORKS 

by 

Mr.  I.  Moir 

Military  Systems  Engineer 

Smiths  Industries  Aerospace  i  Defence  Systems  Company 
Cheltenham  Division,  Bishops  Cleeve,  Cheltenham 
Gloucestershire,  GL52  4SF.  England, 
and 

Mr.  P . A.  Duke 

Senior  Avionics  Systems  Engineer 
British  Aerospace,  Brough 
North  Humberside,  UK 


SUMMARY 

This  paper  discusses  practical  aspects  which  apply  when  attempting  to  design  a 
complex  avionics  system  based  on  a  Data  Bus  Architecture.  An  example  of  such  a  system  is 
the  Stores  Management  and  Weapon  Aiming  system,  and  this  is  discussed  in  detail. 


1.  INTRODUCTION 

1.1  The  Arrival  of  the  Data  Bus 

The  data  bus  offers  many  potential  advr  '.tages  over  hardwired  or  dedicated  data 
transmission  systems  in  the  design  of  Avionic  Systems.  System*  are  interconnected  by  a 
single  or  redundant  twisted  pair  of  wires  via  standard  interfaces,  so  reducing  inter 
system  wiring  and  the  types  and  numbers  of  interfaces.  The  quantity  of  data  transferred 
no  longer  has  a  direct  influence  on  the  inter  system  wiring  and  distributed  computing 
becomes  feasible.  However  in  spite  of  the  obvious  advantages  of  a  data  bus  system  there 
are  certain  limitations  which  could  be  the  source  of  much  heartache  to  the  system  designer. 
Problems  may  result  from  transmission  delays,  digital  sampling  noise  and  the  fundamental 
upper  limit  on  data  item-data  rate  product.  Also,  since  interconnwction  between  systems 
is  via  a  common  path,  faults  in  the  communication  medium  can  have  serit us  consequences, 
and  therefore  the  use  of  redundancy  and  error  correction  techniques  need  to  be  employed. 

1.2  Designing  a  New  System 

When  starting  to  design  a  new  avionic  system  based  on  the  Data  Bus  principle  a  number 
of  system  configuration*  can  be  devised.  The  functional  areas  can  be  allocated  to  hardwire 
units  and  the  interface  signals  rationalised.  However,  the  resulting  system  could  present 
a  high  technical  risk  unless  practical  experience  has  been  gained  with  the  system,  and  the 
inevitable  design  problems  identified. 

In  order  to  investigate  the  limitations  of  data  bus  systems,  to  gain  practical 
experience  of  distributed  computing  centres  interconnected  by  data  buses  (in  advance  of  a 
new  aircraft  project)  and  to  stimulate  manufacture  of  bus  compatible  equipments,  an 
Avionic  Systems  Rig  facility  has  been  established  in  the  UK,  at  British  Aerospace,  Brough. 

2.  THE  AVIONIC  SYSTEMS  RIG 

2.1  History 

The  need  for  an  Avionics  Systems  Rig  became  apparent  during  the  1970s,  the  intention 
being  that  the  Rig  would  provide  a  tangible  and  cost  effective  means  of  risk  reduction  In 
the  development  of  future  aircraft  avionic  systems.  The  purpose  is  to  demonstrate,  In 
ground  rig  form,  total  system  integration  and  system  architectural  design  concepts,  making 
use  of  the  considerable  technological  progress  which  has  been  achieved  In  recent  years, 
and  In  particular  of  the  data  bus. 

2.2  Objectives  and  Aims 

Tha  next  combat  aircraft  project  was  expected  to  arpear  during  1987  to  1990.  Current 
uncertainties  about  timing  and  form  of  such  an  aircraft  make  a  rig  programme  within  the 
spirit  of  that  originally  concetved  even  more  valid  to  exploit  and  practically  ap^ly 
developing  technology. 

The  general  objectives  tor  the  rig  are: 

(a)  To  provide  a  focal  point  for  the  design,  development  and  practical  demonstration  of 
a  fully  integrated  total  system  using  a  multi  bus  architecture  with  sub  system 
integration,  asynchronous  data  transfers  and  total  system  executive  control. 

(b)  To  support  the  design  and  development  of  systems  for  a  fixed  wing  tactical  combat 
aircraft  to  enter  service  towards  the  end  of  the  present  decade. 

(c)  To  provide  a  stimulus  for  the  production  of  equipment  compatible  with  DEF  Stan.  00-18 
(Part  2)  (i.e.  M1L-STD-1553B) . 


28-3 


2.5  Reliability  Requirements 

The  general  reliability  requirement  for  a  mission  critical  avionic  function  is 
that  no  single  failure  within  chat  function  should  prevent  the  completion  of  a  mission. 
This  requirement  can  be  satisfied  by  simplex  systems  provided  that  an  alternative  (or 
reversionary)  source  of  the  data  produced  by  that  system  i3  available.  A  second  failure 
occv.ring  within  an  avionic  function  can  result  in  the  failure  of  that  function  and  cause 
the  m  s'.-ioi,  to  be  abandoned. 

For  a  safety  critical  function  there  is  an  add:  clonal  requirement  that  no  single 
failure  within  that  function  shall  result  in  a  hazardous  output. 

A  given  function,  may  involve  several  systems  (e.g.  The  Weapon  Aiming  Function  may 
call  for  data  from  the  Radar,  Electro  Optical  Sensors,  Navigation  System,  Air  Data  System 
and  Pilot) .  The  communication  path  between  these  systems  needs  to  reflect  the  reliability 
requirements  of  a  given  function.  Hence  the  avionics  bus,  which  only  transfers  signals 
which  are  classed  as  being  mission  critical  is  itself  mission  critical  and  must  be  at 
least  dual  redundant.  Within  the  weapon  control  and  release  area  however  two  functions 
with  different  reliability  requirements  come  together  to  produce  the  successful  release 
of  a  weapon.  They  are  the  weapon  aiming  function,  which  is  mission  critical,  and  the 
weapon  release  function,  which  is  safety  critical. 

A  discussion  concerning  the  design  options  available  for  a  future  weapon  system  ferms 
the  main  contents  of  this  paper.  It  has  been  written  jointly  by  BAe  Brough  and  Smiths 
Industries,  who  have  for  some  time  been  working  together  on  th€  design  and  use  of 
MIL-STD-15 j3B  data  bus  transmission  systems. 

3.  THE  WFAPON  SYPTEM 

3.1  Introduction 

On  the  majority  of  aircraft,  weapons  are  carried  externally,  on  wing  and  fuselage 
pylons.  There  may  be  eleven  or  more  weapon  stations  and  each  may  carry  more  than  a 
single  weapon.  When  considering  a  new  weapon  system  a  major  design  consideration  is  how 
to  communicate  between  the  sensors  and  weapon  system  processors  and  the  weapons.  Many 
alternative  configurations  are  possible  and  it  has  been  argued  that  it  would  be  more  cost 
effective  to  employ  a  current  digital  data  transmission  system  which  has  been  proved, 
than  to  develop  a  new  one.  However,  this  paper  is  exclusively  concerned  with  data  bus 
issues  and  therefore  existing  data  transmission  systems  have  been  excluded. 

The  signals  required  by  a  weapon  can  be  divided  in  three  general  classes:  Aiming, 
Arming  and  Release. 

3.2  Weapon  Aiming 

The  weapon  aiming  system  will  obtain  information  from  a  number  of  sources  to  provide: 

(a)  Attack  geometry  to  the  flight  control  system. 

(b)  Display  Information  to  the  cockpit. 

(c)  Release  cues  to  the  weapon  release  system. 

(d)  Guidance  signals  to  the  weapon  headB. 

Many  weapon  types  need  to  be  told  where  their  targets  are.  Additionally  they  may 
require  such  data  as  aircraft  altitude,  velocity,  target  velocity  etc.  This  data  is 
dynamic,  is  generated  by  the  combined  operation  of  several  systems  (e.g.  Navigation 
Radar  and  EO  systems  and  the  pilot).  These  signals  demand  high  data  rate  and  are  mission 
critical. 

As  the  guidance  signals  result  from  the  combined  operation  of  a  number  of  systems, 
an  early  consideration  involves  the  distribution  of  processing  between  the  various 
systems  which  contribute  to  the  weapon  aiming  function.  The  centralised  and  distributed 
weapon  aiming  system  architectures  are  shown  schematically  in  Figures  2,  3  and  4. 

A  centralised  aiming  system  has  the  advantage  of  having  all  the  data  required  for 
aiding  calculations  available  within  a  single  unit.  However,  this  data  must  be 
transferred  from  the  sensor  and  using  a  data  bus  as  shown  in  Figure  3  can  introduce 
delays  and  digital  sampling  noise  onto  the  highly  dynamic  data.  The  loss  of  accuracy  and 
noise  could  bo  removed  by  using  dedicated  links  from  the  sensors  but  this  .  iluti  n  should 
be  discouraged  as  the  proliferation  of  dedicated  li/iks  d<  itroys  the  vantages  oz  using 
data  buses. 

If  we  consider  air-to-air  and  air-to-ground  weapon  aiming  separately  then  we  find 
that  various  systems  are  already  performing  many  of  the  calculations  required  for  weapon 
aiming.  Thus  with  minimal  additional  processing  the  radar,  for  example,  could  perform 
the  majority  of  the  air-to-air  Weapon  Aiming  computation  and  the  Navigation  system  could 
perform  the  majority  of  the  air-to-ground  Weapon  Alining  computation.  On  balance  the 
diwtributed  processing  option  shown  in  Figure  4  is  preferred. 


284 


In  each  case  the  guidance  signals  must  be  transferred  to  all  weapons.  To  add  11  or 
more  extra  remote  terminals  onto  the  Avionics  bus  would  exceed  the  limit  for  a  single 
bus  and  for  this  reason  above  we  are  forced  to  add  an  extra  bus  to  transfer  the  aiming 
data. 

3.3  Weapon  Arming 

Arming  signals  include  bono  and  missile  fuze  selection,  missile  priming  functions 
such  as  thermal  battery  initiate,  and  the  switch  on  of  aircraft  supplied  electrical  power. 
These  are  generally  discrete  signals  which  place  the  weapon  in  an  active  state,  initiate 
electro-explosive  devices  or  control  the  action  of  electro  explosive  devices.  They 
require  a  low  data  rate  but  are  both  mission  and  safety  critical. 

3. -:  Release  Signals 

In  general  there  are  three  different  types  of  release  signal,  which  are  classified 
according  to  their  form  at  the  aircraft  to  weapon  interface. 

-  Type  1  Release  Signal 

In  order  to  release  a  bomb  it  is  necessary  to  apply  a  high  current  discrete  to  the 
Ejector  Release  Unit  mounted  ir.  the  aircraft  pylon  or  multiple  carrier  unit.  No  signal 
passes  from  the  aircraft  to  the  external  store. 

-  Type  2  Release  Signal 

In  order  to  release  a  missile  it  is  necessary  to  apply  a  high  current  discrete 
signal  to  the  rocket  motor  igniter.  This  discrete  must  be  passed  from  the  aircraft  to 
the  missile. 

-  Type  3  Release  Signal 

For  many  small  weapons  carried  in  large  numbers  on  a  special  carrier  with  release 
under  the  control  of  a  unit  mounted  within  the  weapon  carrier.  The  release  signal  is  a 
low  current  discrete  or  digital  data  word  which  is  passed  from  the  aircraft  to  the 
carrier. 

Whichever  type  is  required  the  release  signal  is  of  very  low  data  rate,  mission 
critical  and  safety  critical  and  will  have  to  be  applied  at  a  precise  time. 

With  the  addition  of  the  weapon  aiming  bus  we  must  now  consider  the  weapon  release 
and  arming  functions  and  the  options  available  for  their  implementation. 

4.  BUS  REALISATION  OPTIONS  AND  THE  EFFECT  UPON  SUBSYSTEM  DESIGN 
4.1  System  Requirements 

(a)  Safety 

(i)  'No  single  failure  shall  result  in  a  safety  critical  signal  being  generated.' 
This  generally  means  that  signals  such  as  release  of  stores,  etc.  must 
operate  effectively  in  a  duplex  mode  i.e.  t  jo  Independent  signals  must  be 
present  before  release  or  other  safety  critical  function  can  occur. 

(ii)  'The  probability  of  dormant  or  multiple  failure  modes  result* »g  in  a  safety 
critical  output  shall  be  extremoly  small.’  (The  figure  of  1  in  10  -7  per 
flight  hour  is  often  quoted.) 

(b)  Availability 

Existing  Stores  Management  Systems  typically  specify  that  the  probability  of  a 
failure  to  operate  should  be  not  greater  than  a  certain  figure,  and  in  this  context 
a  figure  of  1  in  10  -4  per  flight  hour  i.s  often  quoted. 

Bearing  in  mind  that  current  systems  are  usually  designed  to  provide  alternative 
(reversionary)  methods  for  releasing  weapons  and  that  it  should  be  the  aim  as  far 
as  possible  to  minimise  the  need  for  such  reversion  in  future  designs  (which 
utilise  data  buses) ,  the  requirement  for  availability  will  be  somewhat  more 
stringent  that  the  above  figure  suggests.  However,  the  main  avionics  system  will 
have  reversionary  modes  enabling  weapon  release  points  to  be  computed  using  various 
sensors  subsequent  to  sub  system  failures.  It  is  essential  therefore  that  the 
Stores  Management  System  should  be  capable  of  exploiting  this  capability.  A 
requirement  for  further  methods  of  weapon  release  which  exclude  communication  via 
the  Avionics  Bus  needs  to  be  questioned  critically  since  this  will  involve  the 
Introduction  of  dedicated  hardware  with  its  appropriate  cost  and  weight  penalties. 

(c)  Survivability 

The  term  survivability  means  mission  survivability  in  ..n  environment  where  there  it 
a  high  probability  of  battle  damage  being  experienced.  Given  that  the  effective 
release  of  the  stores  and  missiles  would  be  inadequate  without  the  support  of  same 
of  the  aircraft  sensors  and  other  sub  systems,  then  crude  reversionary  mechanisms 


28-5 


would  be  unlikely  to  contrioute  significantly  to  mission  success.  The  potential  of 
a  dual  redundant  or  similar  data  bus  for  providing  both  survivability  and 
availability  therefore  needs  to  be  fully  exploited  to  match  the  avionic  system 
capability. 

4.2  Redundancy  Options 

The  options  available  for  dual  operation  may  be  summarised  as  follows i 

(a)  Duplex  Redundancy 

Both  elements  of  a  dui  .Lex  redundant  system  have  to  be  fault  free  to  obtain  full 
system  performance.  This  may  be  likened  to  Figure  5A  where  two  switches  are 
connected  in  series. 

(b)  Dual  Redundancy 

In  a  dual  redundant  system  either  of  the  two  elements  can  perform  the  same  specified 
system  function.  This  form  of  redundancy  may  be  Active  (e.g.  cyclic  redundancy,  in 
which  the  elements  are  switched  from  one  to  the  other  and  back  again)  or  Passive 
(e.g.  stand  by  redundancy,  in  which  one  element  is  active  until  a  failure  occurs  in 
which  case  the  alternative  element  is  used) .  Dual  redundancy  is  akin  to  the 
parallel  switches  shown  in  Figure  5B. 

The  safety  requirements  of  a  weapon  release  system  dictate  the  operation  of  the 
elements  in  a  duplex  manner.  The  availability  and  survivability  requirements  demand 
some  form  of  dual  redundancy.  Therefore  the  overall  requirement  will  necessitate  both 
duplex  and  dual  redundant  features  such  as  shown  in  Figure  5C. 

4.3  Bus  Realisation 

Duplex  operation  in  a  time  division  multiplexed  digital  transmission  system 
(i.e.  1553B)  may  be  implemented  by  time  separation  of  identical  messages  down  a  data  bus 
instead  of  duplicating  hardware.  By  suitable  coding  of  the  independently  generated 
duplex  signals  into  separate  words  and  transmitted  over  a  single  data  highway  to  a  RT 
and  subsequently  decoded  back  into  true  duplex  signals,  then  it  can  be  shown  that  for  a 
bit  error  rate  of  10  -12,  the  occurence  rate  of  valid  duplex  word  error  is  of  the  order 
of  lO  -18  per  hour.  Indeed  even  if  noise  burst  occur  at  higher  levels  than  this,  then 
providing  reasonable  temporal  separation  of  the  duplex  words  is  made,  the  occurence  of 
error  is  still  highly  improbable.  It  seems  reasonable  then  that  techniques  such  as 
this  may  be  implemented  to  protect  against  inadvertent  release  due  to  noise. 

If  the  duplex  words  representing  the  release  signal  are  dissimilar  in  bit  structure, 
then  it  is  reasonable  to  suppose  that  a  single  failure  mode  will  not  be  able  to  generate 
spuriously  both  words.  To  prove  this,  however  will  require  a  rigorous  failure  mode 
analysis  on  the  simplex  part  of  the  system,  and  this  can  only  be  done  on  a  reasonably 
detailed  design.  Therefore  whilst  preliminary  investigation  indicates  that  safety 
critical  signals  may  be  transmitted  satisfactorily  via  a  dual  redundant  bus  system, 
acceptance  of  the  system  will  lean  heavily  on  a  detailed  failure  mode  analysis  followed 
by  supporting  experimental  evidence. 

A  high  weapon  delivery  availability  (failure*  to  operate  not  greater  than  lo  -4  par 
hour)  consistent  with  a  high  level  of  signal  integration  into  the  bus  sytam,  make  it 
necessary  to  analyse  the  proposed  system  on  the  basis  of  random  failure  rates.  The 
presence  of  dual  computing  and  a  dual  redundant  bus  system  should  enable  this  requirement 
to  be  met  in  the  configuration  shown  in  Figure  6.  The  attractions  of  this  configuration 
are  as  follows: 

(a)  Duplex  operation  may  be  accommodated  using  one  set  of  hardware  by  transmitting 
independently  generated  signals  down  a  single  highway  and  combining  into  true 
duplex  signals  in  the  pylon  interface  unit. 

(b)  The  existence  of  dual  redundancy  within  the  1S53B  buses  and  RT  hardware  permits 
availability  and  survivability  requirements  to  be  met. 

(c)  The  architecture  includes  separate  signal  paths  and  processing  areas  fur  both 
weapon  aiming  and  weapon  release  functions  which  may  aid  certification.  However, 
tho  option  is  available  to  combine  both  aiming  and  release  functions  on  the  same 
bus  should  experimentation  suggest  this  to  be  a  sensible  alternative. 

5.  INTERFACING  WITH  EXTERNAL  STORES 

As  a  result  of  the  above  discussion  we  now  assume  separate  Weapon  Aiming  and  Weapon 
Release  Buses  (remembering  that  current  weapon  release  DDTS  systems  are  excluded  from 
this  paper) .  There  remains  one  link  in  the  chaim  from  the  sensors  and  pilot  to  the 
weapons.  That  is  the  interface  between  the  weapon  buses  and  the  weapons.  Here  again 
several  design  options,  Involving  the  partitioning  of  processing  elements  are  available. 

The  us«  of  data  buses  offers  the  possibility  of  obtaining  a  standard  interface 
connector  such  as  that  proposed  in  MIL-STD-I760.  This  is  highly  desirable  as  it  will 
improve  interoperability.  However,  as  we  shall  see,  this  is  not  easy  to  achieve. 


28-6 


5.1  External  Store  Types 

A  wide  range  of  external  stores  are  now  carried  by  aircraft  and  the  pace  of  new 
weapon  development  and  the  complexity  ol  those  weapons  are  continually  increasing. 
Store  types  which  a  new  aircraft  can  be  expected  to  carry  includes 

-  conventional  'iron'  bombs 

-  cluster  or  dispenser  weapons 

-  short  range  and  medium  range  IR  and  Radar  guided  missiles 

-  Fuel  Tanks 

-  Electro  Optic  sens.  *  and  designator  pods 

-  Electronic  warfare  pods 

The  signals  required  by  these  stores  cover  many  different  types,  but  typically: 

-  28  V  low  current  and  high  current  discretes 

-  Analogue  signals 

-  Digital  data  highways  of  various  types 

-  28  V  and  115  V  power  supplies 

-  Video 


-  R.F. 

They  are  generated  at  each  store  station  within  Pylon  Interface  Units  from  signals 
transmitted  via  the  Weapon  Release  and  Weapon  Aiming  buses. 

The  aim  of  any  new  system  should  be  to  provide  a  standard  interface  which  can 
accept  current  and  future  store  types  with  the  minimum  of  hardware  modification. 

To  provide  an  example  of  the  interface  design  process  and  the  options  available  we 
will  look  at  the  installation  of  an  AIM-9  missile.  This  is  a  typical  guided  weapon  and 
can  be  carried  singly  or  on  multiple  carriers. 

5.2  Pylon/Launcher  Configurations 

Figure  7  shows  a  possible  configuration  for  a  simple  Weapon  such  as  the  AIM-9 
Sidewinder.  Weapon  aiming  (head  slaving)  data  is  fed  into  the  pylon  processor  via  a 
dual  redundant  weapon  aiming  bus.  Firing  signals  to  the  ERU  are  fed  from  both  release 
processors  in  order  tho  emergency  jettison  may  be  achieved  in  the  event  of  either 
processor  failure.  Discrete  and  head  aiming  signals  are  routed  to  the  store  in  a 
conventional  manner  and  existing  launchers  could  be  used.  The  advantages  and 
disadvantages  of  this  schema  are: 

Advantages 

-  Uses  existing  launcher  hardware,  therefore  no  cost  increment 

Disadvantages 

-  Non  standard  interfaces  Launcher/store 

-  power  lines  and  head  aiming  routed  via  pylon  processor 

-  Expensive  in  RT  hardware  (2  RTs  for  headaiming  signals  might  not  be  justifiable) 

Figure  8  shows  a  modified  arrangement  where  the  aiming  signals  are  routed  directly 
to  the  launcher  without  Interfacing  with  a  processor  in  the  pylon.  This  arrangement 
would  require  a  Remote  Terminal  and  processor  in  launcher  to  receive  and  generate  head 
aim  signals.  This  would  require  a  simplex  RT  for  simplex  generation,  or  a  dual- 
redundant  RT  if  dual-redundant  weapon  aiming  generation  were  preferred  or  justified. 

The  advantages  and  disadvantages  of  this  layout  are: 

Advantages 

-  Discrete  and  head  aiming  signals  net  routed  via  pylon  processors,  therefore 
cheaper  aircraft  equipment 

-  common  pylon  to  launcher  interface  possible 

-  Reduction  in  hardware  possible  (1  RT  for  head  aiming  may  be  justifiable) 


28-7 


Disadvcntages 

-  Requires  new  or  modified  launcher,  with  resulting  cost  increase 

-  New  or  modified  launch  »r  not  common  with  old  launcher  at  pylon  to  launcher 
interface  and  therefore  not  compatible  with  existing  aircraft 

Figure  9  shows  a  multistore  launcher  capable  of  carrying  a  number  of  smart  stores, 
each  of  which  is  interfaced  to  the  launcher  by  means  of  a  standard  stores  Interface. 

As  for  the  previous  option  the  head  aiming  and  discrete  lines  are  consolidated  in  the 
launchers  before  being  routed  to  the  store (s).  The  advantages  and  disadvantages  of  this 
configuration  are: 

Advantages 

-  Could  be  standard  with  new  AIM-9  launcher  interface 

-  Standard  launcher/store  interface 

-  Dual  redundant  ’^Mb  buu  routed  to  multiple  'smart'  stores 

Disadvantages 

-  Expensive  in  terms  of  hardware  -  bus  extender  and  standard  interfaces 

-  Several  missiles  now  share  one  electronics  unit  which  becomes  a  common  failure 
point 

It  may  be  seen  that  this  configuration  could  easily  be  made  compatible  with  the 
'new'  AIM-9  launcher  configuration  just  described.  Hence  standard  pylons  could  be 
interfaced  with  interchangeable  launchers  permitting  a  wide  mix  of  weapons  options  to  be 
carried.  Furthermore,  the  use  of  standard  stores  interfaces  at  the  launcher /store 
interface  means  that  a  mix  of  smart  or  dumb  weapons  of  differing  National  ordnance  could 
be  carried.  This  feature  would  gs  -atly  enhance  the  effectiveness  of  NATO  airborne 
tactical  forces  in  any  f^.  -lict.  The  penalty  paid  for  this  interoperability  is 

however  the  introduction  of  ..aw  and  more  complex  launcher  hardware  including  the 
incorporation  of  micro-processor  hardware. 

The  technicalities  associated  with  the  'bus  extender'  facility  shown  in  the  multiple 
MIL-STD-1760  launcher  are  presently  being  examined  by  Smiths  Industries,  in  order  to 
fully  identify  and  quantify  the  trade-offs  which  are  involved. 

It  may  be  seen  that  each  of  the  three  simplified  options  has  advantages  and 
disadvantages.  Depending  upon  the  need  for  carriage  of  smart  weapons,  and  the  need  for 
standardization  of  the  stores  interface,  trade  offs  exists  in  the  areas  of  hardware 
complexity,  weight,  reliability,  integrity  and  cost  (including  cost  of  ownership).  The 
user  will  therefore  need  to  list  the  relative  priorities  in  these  areas  and  quantify  the 
trade-offs  in  order  to  assume  maximum  cost-effectiveness  of  the  overall  weapons  system. 

CONCLUSION 

Tne  paper  has  served  to  describe  the  work  being  undertaken  in  the  UK  industry  on 
the  interfacing  of  MIL-STD-15533  buses.  The  development  of  an  integrated  systems  rig 
has  been  outlined  and  the  alms  of  the  rig  identified.  The  use  of  data  buses  for  weapon 
aiming  and  weapon  release  purposes  have  been  described  in  detail.  The  trade-offs  which 
exist  when  the  requirements  of  a  standard  launcher/stores  interface  have  also  been  taken 
into  consideration. 


FIG. 1  GENERALISED  SYSTEM  ARCHITECTURE 


FIG. 2  GENERALISED  WEAPON  AIMING  FUNCTION 


FIG. 3  A  CENTRALISED  WEAPON  AIMING  SYSTEM 


28-9 


RADAR  +  AIR 

TO  AIR  WEAPON 
AIMING  COMPUTATION 

NAV.  SYSTEM  + 
AIRTOGROUNO 
WEAPON  AIMING 
COMPUTATION 

PILOT’S  CONTROLS 
AND  DISPLAYS 

_ J _ 

L  _  1 

- 1 

WEAPON  AIMING 

BUS 

CONTROLLER 

T  -  " J 

AVIONICS  BUS 

V 

WEAPON  AIMING  BUS 

i  _  mr. _  _ i 

1 

WEAPON 

WEAPON 

WEAPON 

WEAPON 

FIG. 4  DISTRIBUTED  WEAPON  AIMING  SYSTEM 


A. 


SYSTEM  A  SYSTEM  B  B. 

- \ - \. . -  DUPLEX 


SYSTEM  A  I  SYSTEMS 
I 


LANE  1 

—  DUAL  REDUNDANT 
LANE  2 


LANE /BUS  1 

DUPLEX  DUAL  REDUNDANT 
*  ANE / BUS  2 


FIG.  5  DUAL  SYSTEM  OPERATION  OPTIONS  -  SIMPLIFIED  SCHEMATIC 


I 

i 


DUMB’  STORE 


SMART  STOWE 


FIG. 6  GENERALISED  WEAPON  SYSTEM  CONFIGURATION 


RT  -REMOTE  TERMINAL 
ERU  -  EJECTOR  RELEASE  UNIT 
EU  -ELECTRONICS  UNIT 
■  AMD 


ION  1 


—  |  WA  BUS 
|  WRBUS 


RT  -REMOTE TERMINAL 
ERU  -  EJECTOR  RELEASE  Ut  JIT 
EU  -ELECTRONICS UNIT 
■  AND 


riON  2 


WA  BUS 


29-1 


THE  TRAFFIC  FLOW  IN  A  DISTRIBUTED  REALTIME  COMPUTING  SYSTEM  (RDC-SYSTEM) 

WITH  A  FIBEROPTIC  RINGBUS  SYSTEM 

Dirk  Hager  and  Reinhard  BShre 

Fraunhofer  Institut  filr 

Inf ormationa -  und  Datenverarbeitung  (IITB) 
Sebastian-Kneipp-Str.  12/14 
D-7500  karlaruhe  1,  Germany 


ABSTRACT 

The  new  generation  of  automatic  syatems  is  essentially  characterized  by  distributed  multi- 
computersystema .  The  architecture  is  based  on  distributed  microcomputer  stations  linked 
together  by  a  bus  system.  These  systems  give  much  more  design  alternatives  than  conventio¬ 
nal  single  or  multicomputer  systemsi  the  danger  of  obtaining  bottle  necks  of  system  per¬ 
formance  is  considerably  greater  than  it  was  by  using  functional  .nodules  operating  inde¬ 
pendently  and  simultaneously.  Therefore,  mathematical  modelling  of  bus-linked  multicom¬ 
puter  systems  and  the  experimental  evaluation  of  these  models  in  online  operation  by  means 
of  measurements  is  of  increasin'’,  importance. 

In  this  paper  the  RDC-system,  a  realtime  computing  system  developed  by  the  IITB  and  the 
traffic  flow  on  its  fiberoptic  ringbus  system  are  presented. 

1.  INTRODUCTION 

The  new  generation  of  automatic  systems  is  essentially  characterized  by  distributed  multi- 
computersystems  (Fig.  1  and  /I/).  In  systems  like  this  use  is  made  of  a  hierarchical  de¬ 
composition  of  all  tasks  of  the  automating  system.  The  decomposed  tasks  are  distributed 
among  a  hierarchy  of  autonomous  subsystems  communicating  with  each  other.  These  subsystems 
are  realized  by  means  of  microcomputer  stations  today.  Referring  to  an  automatic  fire 
control  system  we  have  for  instance  three  functional  levels  performing  the  following  tasks: 

Weapon  control  level  controlling  directly  the  weapons  and  the  equipment  in  the  field, 
fire  control  level  coordinating  several  weapon  control  systems, 
command  level  for  the  orientation  of  all  weapons  in  the  battle  field. 


©• 


Fig.  1:  Automatic  control  system, 
fp.£.  fire  control  system) 


29-2 


Fig.  2  shows  a  typical  architecture  of  such  automating  systems.  There  we  bob  the  whole 
system  which  is  composed  by  several  subsystems  called  equipments.  The  equipments  ere 
coupled  with  each  other  by  means  of  bus  couplers  to  a  bit  serial  system  bus.  This  system 
bus  for  instance  makes  the  connection  to  a  central  master  control  panel.  The  equipments 
again  comprise  subsystems  which  are  called  devices.  These  devices  are  linked  together  by 
the  equipment  bus,  in  general  a  parallel  bus  system.  The  devices  in  turn  are  built  up  by 
printed  circuit  boards  which  are  connected  by  a  parallel  device  be®.  In  tie  next  step 
down  we  find  components  on  the  boards  tied  together  again  by  a  parallel  bus  system  called 
board  bus.  And  finally,  we  see  clusters  of  I/D-devices  which  are  connected  with  each  other 
and  the  boards  by  an  I/O-bus  in  turn. 


Fig.  2:  Bus  hierarchy  in  a  distributed  automating  system 


Let's  oummar'ze: 

The  hierarchy  of  the  automatic  control  system  is  mapped  onto  a  hierarchy  of  hardware  sub¬ 
systems  in  a  dual  manner  121 .  Clusters  of  subsystems  are  pooled  together  by  means  of  bus 
systems  and  form  the  subsystems  belonging  to  the  adjacent  level  of  the  hierarchy  higher 
up.  In  any  case,  the  functions  of  the  subsystems  should  be  performed  autonomously  in  the 
main  and  the  need  of  communication  between  these  autonomous  subsystems  should  be  mini¬ 
mized  in  order  not  to  get  bottle  nocks  of  system  performance  ard/or  system  availability. 
Typical  lengths  of  the  shown  busses  are  for  the  device  l us  s-.jral  0.1  meters,  for  the 
equipment  bus  about  10  metBrs  and  for  tho  serial  system  !.'■  .■?  up  to  several  kilometers. 
Distributed  systems  as  described  above  are  particularly  important  in  o-oes  in  which  the 
technical  process  is  locally  distributed.  In  these  cases  it  is  possible  to  limit  the  effects 
of  failures  locally  and  to  replace  the  crashed  functions  by  other  parts  of  the  system. 

This  principle  of  error  recovery  or  system  reconfiguration  with  graceful  degradation  uses 
the  principle  of  dynamic  and  functional  redundancy. 


29-3 


2,  THE  (DISTRIBUTED,  FAULTTOLER ANT  ROC-SYSTEM 

The  Fraunhofer  Institute  i "  Information  and  Data  Processing  has  developed  the  "Really  Dis¬ 
tributed  Control  Compute  System",  it  is  called  RDC-system  /3/,/4/.  In  this  system  there 
exist  a  lot  of  distributed  microcomputer  stations  communicating  with  each  other  via  a 
fiberoptic  ringbus  system.  Up  till  now  five  RDC-systems  are  put  into  operation  in  different 
industrial  applications,  one  of  them  for  thi.  closed  loop  control  of  28  pit  furnaces.  The 
latter  system  has  been  working  at  the  iron  works  at  THVSSEN  AG  since  3une  1979.  This  appli¬ 
cation  will  be  described  in  the  following, abreviations  used  in  th%  text  and  the  pictures 
are  listed  in  table  1. 


LpP 

LASP 

SEA 

LSErt 

PpP 

PA3P 

BS/SR 

NTST 

AA 

AE 

BA 

BE 

SBF/SBFT 

E/A 

(1 

Si 

DSG 

MB 

MSP 

F_AF 

7.E 

PR  310 


Communication  processor 

Working  storage  of  the  LpP 

Transmitter/receiver  adaption 

Light  transmitter/receiver  module 

Process  control  processor 

Working  storage  of  the  PpP 

Bus  switch  unit/fault  diagnosis 

Supply  control 

Oigitnl-ta-analog  converter 

Analog-to-digital  converter 

Binary  output 

Binary  input 

Local  control  panel 

I/O-devices 

Sensors 

Actuators 

Alphanumeric  terminal 
Magnetic  tape  recorder 

Mass  storage  in  the  master  control  room 
Color  screen  I/O  panel 

Computer  for  the  master  control  operation  with  the  color  screen  I/O 
panel 

Microcomputer  SIEMENS  310 


Table  1 :  Abreviations 


2.1  hARCWARE  STRUCTURE 

The  hardware  structure  of  the  RDC-system  is  shown  in  Fig.  3.  Here  we  see  the  d* rtributed 
microcomputer  stations  with  the  microcomputers  (PuP)  performing  the  control  tasks,  the  I/O- 
devices  (E/A]  and  the  measuring  and  actuating  elements  at  the  technical  process  (M,ST). 

Each  station  is  connected  by  means  of  a  special  communication  processor  (LpP)  to  a  bit 
serial  fiberoptic  ringbus  system.  In  the  middle  of  each  station  there  is  a  special  de¬ 
vice  (BSU]  performing  the  on- line  error  detection  within  the  station  and  at  its  inter- 
faces,  initializing  the  status  reporting  messages  conveyed  by  the  LpP  to  the  whole  system 
and  being  to  isolate  the  faulty  parts.  By  this  a  systemwide  distributed  on-line  fault- 

diagnosis  is  performed  and  the  reconfiguration  procedure  is  started.  In  the  case  of  the 
pit  furnacB  application  the  neighbour  station  replaces  the  functions  of  the  faulty  station 
and  controls  both  technical  processes  simultaneously,  but  with  degraded  performance  for 
each  of  them.  After  repair  system  regeneration  automatically  takes  place  as  well.  More¬ 
over.  we  see  at  the  top  of  Fig,  3  a  double  computer  system.  One  of  these  computers  per¬ 
forms  all  functions  of  displaying  the  statuses  of  the  technical  process  and  the  automating 
system  as  well,  the  interactive  operation  by  the  operators  and  all  documentation  and  data 
gathering  tasks  being  essential  for  the  next  levels  in  the  functional  hierarchy.  It  is 
called  EAF-system.  The  other  computer  fulfills  in  normal  cases  the  taek  of  dynamic  loading 


94 


29-5 


If  one  of  these  two  computers  fails  the  latter  tasks  are  shut  down  and  the  remaining  faci¬ 
lities  are  used  to  maintain  the  more  important  tasks  of  displaying  the  technical  process 
and  the  interaction  of  the  operating  personal.  Besides  the  facility  of  central  operation 
there  exists  the  possibility  of  distributed  input  and  outpv  actions  by  an  operator  at  the 
ROC-stations  in  the  field.  For  this  reason,  each  RDC-station  has  a  local  control  panel. 

Fig.  4  shows  the  detailed  internal  hardware  structure  of  a  ROC-Btation. 

2.2  PROGRAMS 

Fig.  5  shows  the  distribution  of  the  program  system  in  the  RDC-system.  Each  station  con¬ 
tains  identical  transport  systems,  status  reporting  systems  and  line  reconfiguration  systems. 
All  these  programs  are  microprogrammed  end  performed  by  the  LpP. 

The  ROC-stations  contain  identical  network  operating  systems,  local  PEARL  operating  systems 
and  run  time  systems  in  the  levels  higher  up.  These  programs  reside  in  the  PuP  and  are  call¬ 
ed  DISPOS  (Distributed  PEARL  Operating  Syatem) .  And  finally,  all  application  programs  are 
residently  loaded  for  both  cases  of  normal  and  reconfigured  operation.  Both  computers  of  the 
central  master  control  roor.  contain  the  transport  system,  the  status  reporting  system  and 
the  line  reconfiguration  system  running  on  their  LyPs.  Higher  up  we  see  the  network  operat¬ 
ing  system  with  a  local  observer  and  the  manufacturer  supplied  operating  cystem  ORG  310. 

And  finally,  there  are  the  application  programs  of  the  EAF  system  and  the  program  production 
and  loading  system. 


Fig.  5:  The  distributed  program  system  of  RDC 


2.3  QATA  bASE 

Fig.  6  shows  the  distr  bution  of  the  data  base  of  the  RDC-system.  Each  station  has  a  list 
of  system  statuses,  a  list  of  actual  process  values,  a  list  of  control  parameters  and  a 
list  of  internal  process  data.  Moreover,  each  station  has  accesc  to  the  actual  values  com¬ 
ing  from  the  technical  process.  Copies  for  initialization  are  located  on  the  bulk  memories 


29-6 


2.4  SUMMARY  Of  TIIC  MOST  IMPORT  ANT  FEATURES 
The  most  important  features  of  the  ROC-system  are 

dynamic  redundancy  for  all  system  levels  and  modulesi 

fault  tolerance,  distributed  ffult  diagnosis,  also  transmitted  to  and  displayed  on  the 
central  colour  screen  systemi 

ODD-type  bussystem: 

Decentralized  channel  assignment 
Decentralized  message  transmission 
Decentralized  message  absorption 

transmission  medium:  optical  fibers 

g 

transfer  rate:  10  bit  per  seci 
distributed,  hierarchical  data  base: 

-  MULT I COMPUTER -PEARL, 

dynamic  loader  for  automatic  on-line  loading: 

central  control  panel  with  flow  chart  represention  with  realtime  update,  curves  and  so  on, 


.1 


-  29-7 


rolling  map,  light-pon  interaction  with  operator  guidance, 

automatic  selection  of  display  information  at  changes  of  the  process  state. 

3.  CLASSIFICATION  OF  TECHNICAL  COMMUNICATION  SYSTEMS  WITH  RESPECT  OF  TRANSPORT  AND 
_ CONTROL  PRINCIPLES _ 

Essential  part  of  the  ROC-system  is  its  communication  system.  In  the  following  a  classi¬ 
fication  scheme  for  transport  and  control  principles  in  technical  communication  systems 
with  an  analytical  assessment  of  the  performance  is  given  /6/. 

Technical  communication  systems  have  to  convey  messages  between  several  stations  and  wide 
use  is  made  of  bus  systems.  Three  main  actions  must  be  performed  for  each  complete  trans¬ 
mission  cycle: 

1.  bus  assignment 

2.  message  transmission 

3.  logical  and  physical  message  absorption 

These  actions  can  be  performed  with  or  without  the  aid  of  a  central  master.  In  accordance 
to  this  we  get  a  classification  scheme  as  shown  in  Fig.  7.  The  ROC-system  i.j  classified 
as  a  QDD-type  system. 


CENTRAL 

MESSAGE  TRAN 

DECENTRAL 

SMI SS ION 

CENTRAL 

DECENTRAL 

CENTRAL 

BUS 

ZZZ 

ZDZ 

ZZD 

ZDD 

ASSIGWENT 

DECENTRAL 

DZZ 

DDZ 

DZD 

DDD 

CENTRAL 

1.6.  PASSIVELY  COUPLED 

MESSAGE  ABSOI 

DECENTRAL 

I  .G.  ACTIVELY  COUPLED 
TPTION 

Fig.  7:  Classification  of  bus  systems 


For  the  analytical  assessment  two  variables  are  important 

the  average  maximum  throughput  *^max  available  for  each  station  and 

the  average  transmission  time  t  from  transmission  demand  in  the  source  station  till 
complete  reception  of  the  message  in  the  destination  station. 

rig.  B  shows  the  results  for  3  classes:  ZZZ,  DDZ  and  DDD.  We  see  three  axis  with  loga¬ 
rithmic  scales: 

x-axis:  Arrival  rate  in  bits  per  sac 

y-axis:  Average  transmission  time  t 

z-axis:  Total  number  of  stations  N 

The  maximum  throughput  ^max  ODD  is  approximately  4  times  higher  than  of  a  ZZZ  system  and 
about  2-3  times  higher  than  of  a  DDZ  system.  The  averagB  transmission  times  T  of  ZZZ  are 
higher  than  DDZ,  and  DDZ  has  essentially  higher  transmission  times  than  DDD  in  any  case. 


29-9 


distribution  of  sourca  and  destination  addresses 
distribution  of  interarrival  time  between  messages 
distribution  of  marsage  lengths 
distribution  of  message  types 
frequency  of  message  sequences 

All  these  distributions  are  locally  dependent  and  dependent  in  time,  with  steady-state  or 
unsteady-state  behaviour.  Moreover,  they  depend  very  strongly  on  the  application,  parti¬ 
cularly  the  source  and  destination  address  distribution.  Typical  application  classes  are 

centralized  management  of  distributed  control  systems  (see  measurements  of  the  pit  fur¬ 
naces  system) 

pipelined  technical  process 

hierarchical  structured  technical  process  (treo  structure) 
uniform  distribution  (totally  meshed  communication  structure) 
and  so  on. 

Because  of  this  complicated  world  the  1ITB  has  developed  a  measuring  processor  /7/  for 
gathering  the  message  flow  in  the  RDC  ringbus  system.  Bui  before  describing  the  traffic 
flow  in  the  ROC  communication  system  and  the  applied  measuring  method  it  is  necessary  to 
understand  how  this  system  works. 


Fig.  9:  Queueing  model  of  the  RDC-DDD-type  communication  system 


Fig.  9  shows  the  queueing  model  of  the  communication  system.  The  RDC-stations  are  connected 
by  means  of  serial  fiberoptic  lines  to  a  ring  structure.  Each  station  has  a  receiver  E^  and 
a  transmitter  S..  In  order  to  achieve  communication  between  the  stations  each  of  them  has 
its  own  address  and  consequently  each  message  has  in  its  header  besides  the  message  type 
the  information  where  this  messagB  (source  address)  comes  'rom  and  where  this  message  (de¬ 
stination  address)  ia  destined  to.  Special  types  of  messages  (e.g.  the  status  reporting 

messages)  have  a  broadcast  address.  Let's  have  a  look  on  the  handling  of  messages  within 
a  station.  On  receipt  of  the  first  word  of  a  message  the  contained  destination  address  is 

compared  with  the  station's  address  by  the  LyP.  If  the  addresses  are  equal  the  arriving 

message  will  be  delivered  to  the  station  and  absorbed  (id^),  if  it  is  not  equal  it  will  be 
forwarded  to  the  next  station  (X^j).  This  will  be  with  the  minimum  delay  of  one  word  in 
all  oases  in  which  the  queue  P^  for  passing  messages  is  empty  and  if  the  station  is  not 
transmitting  a  message  by  itself  (X^).  In  the  latter  case  the  words  of  the  passing  message 
are  buffered  in  the  first  come  first  serve  queue  P^.  Messages  of  the  station  t X ^ )  are  not 
allowod  to  be  sent  if  the  station  is  receiving  or  transmitting  a  passing  message.  By  this 
buffer  insertion  mechanism  we  have  a  decentralized  bus  assignment,  a  decentralized  message 
transmission  and  a  decentralized  message  absorption.  In  this  protocol  consideration  of 
round-trip  delay  of  the  transmission  medium  is  not  needed  as  for  instance  in  collision  type 


2'?- 10 


local  area  networks  as  ETHFRNET. 

In  order  to  detect  faults  on  the  ringbuo  sya'em  each  ROC  station  has  Its  own  fault  detectors, 
and  in  order  to  tolerate  them  there  are  two  f iboreptical  rings,  one  for  each  direction  with 
the  respective  receiver  and  transmitter  devices.  That  means  that  each  direction  can  be  ope¬ 
rated  unidirectirnally  or  pseudo-bidirectionally  in  the  so-called  oscillating  mode  of  ope¬ 
ration  where  the  direction  is  changed  periodically. 

In  the  DOD-type  communication  system  a  lot  of  key  data  are  available,  but  locally  distri¬ 
buted.  The  measuring  system  was  implemented  by  a  further  RDC  station  connected  to  the  serial 
fiberoptic  bus  with  special  functions.  These  ares 

observation  of  the  traffic  flow  at  its  connecting  point 

sending  and  receiving  of  special  measuring  messages  in  order  to  get  the  transmission 
times 

buffering  with  optional  preselection  and  recording  of  the  measured  data. 

Because  of  the  knowledge  about  the  different  traffic  flows  in  the  pit  furnace  application 
the  optimal  connecting  point  is  adjacent  to  the  two  computers  of  the  master  control  room 
(Fig.  10).  Unfortunately,  not  all  distributed  traffic  flows  (e.g.  demand  messages  and  the 
respective  reply  messages)  can  be  accessed  in  bus  systems  of  the  class  □□□  at  this  connect¬ 
ing  point,  but  all  critical  oases  of  heavy  load  can  be  detected  in  this  way. 


Fig.  10: 


Traffic  flow  on  the  RDC  rin  gbus 
system  in  the  pit  furnace  application 


29-1 1 


In  the  can  o*  undlsturbaJ  ring  operation  (Fig.  1 0 A )  wa  have  thrae  main  massage  typasi 
The  status  raporting  massages  with  tha  arrival  rate  X g ,  the  demand  massages  of  tha  EAF- 
computar  i  and  tha  reply  massages  o,  the  ROC  stations  with  tha  arrival  rata  X^.  In  the 
left  ring  direction  operation  mode  as  shown  in  this  figure  all  status  reporting  messages 
and  reply  messages  pass  tha  measuring  system.  If  we  change  into  the  right  direction  all 
status  reporting  messages  and  demand  messages  will  be  accessed.  For  simulation  of  tha  os¬ 
cillating  mode  of  operation  we  apply  an  interruption  of  the  ringbus  between  the  two  com¬ 
puters  of  the  master  control  room  (Fig.  10B).  In  addition  to  the  message  types  of  the  un¬ 
disturbed  ring  operation  we  now  have  the  messages  with  the  arrival  rate  X^  initialized  by 
the  communication  computers  adjacent  to  the  interruption  points  to  perform  the  oscillating 
mode.  All  messages  to  and  from  the  EAF  computer  and  the  messages  of  the  communication  com¬ 
puters  pass  the  measuring  system. 

At  any  event,  a  set  of  data  is  recorded  with  the  following  contents; 

dessage  type,  source  address,  destination  address,  message  length  and  clock  time  (time  re¬ 
solution  10  ps) . 

Tnese  records  are  concentrated  stepwise  and  if  necessary  preselected,  recorded  on  a  mag¬ 
netic  tope  and  finally  transferred  into  a  mainframe  computer  system  (Fig.  113. 


FI  UR  OFT  1C  RtlMMI 


-S 


1 


I  I MT  Tim— IT 

ilcUvfl  hSu 


ITTIR-/ 


u* 


CACHC  1 

fu»F> 


C«Ht  2 
<M»F> 


HICROFSOMM,  CACM  1  <lW> . 
KZYDATA  OF  121  MIUMI  NITH 
CLOCK  T INC  <10  US) 


Mil  COBKcmnOM 
rWtt/j-FCT'Mll 

Mt'JW»R0UTIU.  UC«  2 
umu  OF  5000  NCUMES 

DATE  KKCTIOK  Of  1 1  ORAL 


Q_D 


0CSK  TOP 

SOT 


C0NPUTIM 


m 


KVMCTIC 

~hn 


MU 

MCOftDS 


8 


MTft  flWWHWr 

Qj-LUg-ftitwATimi 

MP-PtOG*AH 

MA8METIC  TAPI 

KfYDATA  01  10®  NESSA6CS 


FI UAL  DAT* 

fjtfll  Till 


Fig.  11s  The  measuring  and  analyzing  system 


29-12 


Hare,  a  lot  of  analyses  can  be  done.  Examples  for  results  evaluated  from  measurements 
in  the  pit  furnace  application  are  shown  in  the  Figs.  12-14.  Fig.  12  3hows  typical  load 
distributions  of  the  traffic  flow  for  the  transmission  into  the  right  direction  (A)  in¬ 
cluding  X  and  X  . i  the  left  direction  tB)  including  X  and  X.  and  the  case  of  os- 
dilating  mode  of  operation  (C),  including  X^,  X^  and  Xy. 


29-13 


Fig.  14  shows  the  round  trip  delay  time  for  test  messages  of  the  lengths  of  1  data  word 
(A)  and  of  64  data  words  fB)  respectively. 


0 .  A  + 


0.6  + 


0.4  ♦ 


0.?  ♦ 


o.a  ♦ 


0.6 


0.4 


0.2 


••• 

0.1  0.3  1  3  10  30  100  tu 


I  I ♦♦♦ I I 


♦  I ♦♦♦!♦♦♦  I  ♦ 


0.1  0.3  1  3  10  30  100  tu 

y-axia:  Frequencyi  x-axiai  Transmission  time  in  time  units 


Fig.  14*  Distribution  of  the  transmi3s ion timas  for  round  trip  test  messages 


In  general  one  seas  that  all  distributions  are  very  peaky  in  particular  in  the  case  of 
message  lengths.  The  distributions  depend  very  strongly  on  the  application.  And  therefore, 
it  is  necessary  to  do  the  classification,  the  analytical  work,  simulation  work  if  possible, 
and  measurements,  too.  The  classification  gives  a  framework  for  thinkingi  by  the  analy¬ 
tical  work  one  is  forced  for  a  detailed  modelling  of  the  general  system  behaviour,  by  this 
one  gets  an  assessment  which  is  good  for  a  general  view  and  for  finding  instabilities  or 
things  like  thoti  by  simulating  one  is  able  to  model  more  complex  models  or  more  special 
situations  which  represent,  for  instance,  certain  applications ■  and  finally,  the  measure¬ 
ments  with  application  and  artificial  patterns  are  necessary  for  getting  confidence  in  all 
these  models. 

5.  REFERENCES 

/I/  Borsi.  L.i  E.  Pavlik,  I960,  "Konzepte  und  Strukturen  dezentraler  ProzeBautomatisie- 

rungssysteme" .  Regelungstechnische  Praxis  9  (19BQ) , S. 302-309,  R.Oldenburg-Verlag  MUnchen. 

121  Syrbe,  N.,  1981,  "The  description  of  f ault-tolerant  systems".  Process  automation 
1/1981  . 

/ 3 /  Hager.  D. i  H.  Steusloffi  M.  Syrbe,  1979,  "Echtzeitrechnersystem  mit  varteilten  Mikro- 
prozessoren" ,  BMFT- Forschungsbsricht  DV  79-01,  Datenverarbeitung,  April  1979. 

/4/  Hager,  D.  (Hrsg.),  1981.  "Systemerganzungen  und  Piloterprobung  eines  f ehlertoleran- 
ten  Echtzs itrechn8rsystem3  mit  vsrteilten  Mikroprozessorei.  (RDC-System) " ,  BMFT-For- 
schungsbericht  DV  81-  ,  Datenvsrarbeitung,  Mai  1981  (to  be  published). 

/5/  Stsusloff.  H.,  1980,  "Programming  distributed  computer  systems  with  higher  level 

languages.  Distributed  computer  control  systems.  Proceedings  of  the  IFAC  workshop, 

Tampa,  Florida,  U.S.A.,  2-4  Oct.  1979.  Pergamon  Press,  New  York. 

/6/  H8ger.  0.,  1979,  "Kommunikationsverfahren  fur  SammBlleitungssysteme  und  deren  Lei- 
stungsbsschreibung" ,  Regelungstechnik  1979. 

Ill  Heger,  D.i  R.  Bahre,  1980,  "MeBprozessor  fOr  Rekonf igurationsablSufe  und  Obertragungs- 
strSme  im  Echtzeitrechnersystem  mit  verteilten  Mi I  roprozessoren  (RDC-System)",  IITB- 
Mitteilungen,  Karlsruhe,  FhG-Berichte ,  Munchen,  S.  2-80. 


29-14 


CONTRIBUTION  TO  THE  OITCtlSSION  by  Mr.  J.  Schoelch,  IABC,  Ottobrunn  /  Germany. 

Questions  How  do  you  tolarats  an  Interruption  on  the  ringbus  system  ? 

Answen  Let  mu  explain  the  fault-tolerance  by  '  hree  typical  examples 

a)  Communication  computer  failure i 

faults  within  a  station  are  detected  by  the  fault  diagnosis  unit  (8S/SR)  or  by 
self  test.  In  case  of  error  the  communication  computer  is  disconnected  by  the 
bus  switch  unit  (BSU,  fig.  4)  and  the  light  receiver  /  transmitter  module 
(LSEM,  fig.  4)  works  as  a  repeater  device. 

b)  Line  interruption  in  one  direction  (e.g.  break  of  one  fiberoptic  lineli 

At  any  time  there  are  signals  on  the  fiberoptic  lines  of  the  current  trans¬ 
mission  direction,  either  messages  or  delimiter  bytes  between  the  messages. 

If  a  station  receives  no  more  signals  the  receiver  /  transmitter  adaption 
(SEA,  fig.  4)  recognizes  this  by  a  time-out  and  the  LpP  (fig.  4)  sends  a  broad¬ 
cast  message  to  all  other  LpP’s  as  command  for  changing  the  ring  direction. 

c)  Line  interruption  in  both  directions  (e.g.  breakdown  of  power-supply  in  a  RDC 
station  1  i 

The  LpP  adjacent  to  the  interruption  acts  as  in  case  b)  but  after  changing  the 
direction  the  ring  is  not  cicised.  This  is  recognized  by  the  two  LpP’s  adjacent 
to  the  interruption  point  and  thus  the  transmission  direction  is  periodically 
changed  by  them.  So,  in  one  time  period  each  station  can  send  messages  to  all 
otnsr  stations  on  the  left  and  in  the  following  time  period  to  all  stations  on 
the  right.  Thus,  the  traffic  flaw  is  chopped  periodically  into  two  directions 
and  the  performance  is  only  reduced  by  the  additional  messages  reversing  the 
transmission  direction.  If  one  of  the  LiP’s  adjacent  to  the  interrupting  point 
receives  signals  from  a  previously  failed  direction  a  reconfiguration  procedure 
is  started  with  the  re- integration  u!  repaired  modules.  By  this  procedure  the 
system  reconfigures  towardr  the  ring  configuration  with  its  full  performance. 


DISPERSED  SENSOR  PROCESSING  MESH  PROJECT 
by 


30-1 


Vincent  A.  Mugna 

The  Charles  Stark  Draper  Laboratory,  Inc. 
555  Technology  Square 
Cambridge,  Massachusetts  02139 
U.S.A. 


SUMMARY 


The  F-8  Dispersed  Sensor  Processing  Mesh  (DSPM)  project  is  .in  exploratory  program  in¬ 
volved  in  the  development  and  test  of  the  concept  of  a  network  communication  structure. 

The  elements  of  the  structure  are  a  Bus  Controller  and  a  number  of  nodes  all  of  which  are 
interconnected  by  multiple  data  flow  paths.  This  structure  is  proposed  as  the  communica¬ 
tion  medium  between  the  subsystems  of  a  distributed  avionic  system. 

The  multiplicity  of  data  paths  between  nodes,  in  conjunction  with  an  intelligent  control¬ 
ler  that  constructs,  monitors,  and  controls  a  virtual  data  bus  composed  of  these  nodes 
and  their  interconnecting  links,  is  envisioned  as  a  structure  much  more  tolerant  to 
faults  and  physical  damage  than  the  presently  employed  avionic  data  busses. 

The  virtual  data  bus  constructed  by  the  Bus  Controller  can  be  reconfigured  by  the  Con¬ 
troller  in  response  to  sensed  faults  and  physical  damage;  thus,  it  should  be  capable  of 
maintaining  communication  between  the  various  subsystems  o'  an  avionics  system  through 
more  numerous  and  more  severe  occurrences  of  fault3  and  physical  damage. 

In  order  to  teat  and  establish  a  data  base  for  this  proposed  communication  structure,  the 
elements  that  comprise  it  must  be  designed  and  built.  These  elements  are  not  just  the 
hardware  that  is  inherent  to  the  structure,  they  also  include  the  algorithms  mechanized 
in  the  Bus  Controller's  software,  the  operating  characteristic  of  the  network,  and  the 
communication  protocol  used.  The  decisions  made  during  the  development  of  the  jystem 
must  be  carefully  thought  out  and  mechanized  in  the  most  efficient  and  reliable  manner 
possible.  This  paper  addresses  the  design  and  associated  decisions  made  during  the  deve¬ 
lopment  of  the  network  hardware  and  software. 

1 .  INTRODUCTION 

The  evolution  and  development  of  integrated  avionic  systems  in  which  a  number  of  distri¬ 
buted,  dedicated  processors  perform  specific  aircraft-control  and  mission-oriented  func- 
tio.ts  has  demonstrated  that  the  communication  structure  between  the  elements  is  the  crit¬ 
ical  hardware  element  in  a  distributed  computer  architecture. 

To  date,  the  solution  to  this  problem  has  concentrated  on  the  means  to  adapt  a  bus  to 
such  applications.  The  inherent  vulnerability  of  buses  to  physical  damage  and  the  dis¬ 
abling  effects  of  failed  subsystems  connected  to  a  bur  have  required  a  number  of  develop¬ 
mental  modifications  to  the  basic  bus.  Multiple  buses,  redundant  buses,  cross-strapped 
buses,  and  various  combinations  of  these  are  typical  .  u»  modifications.  An  alternative 
approach — a  network  communication  structure — has  been  proposed  as  a  better  solution  to 
the  problem. 

The  F-8  Dispersed  Sensor  Processing  Mesh  (DSPM)  project  has  integrated  a  communication 
network  structure  to  an  engineering  version  of  a  flight-qualified,  triply-redundant, 
digital  flight-control  system,  the  F-8  Digital  Fly-By-Wire  (F-8  DFBW)  System.  Operating 
the  communication  network  iri  parallel  to  the  existing  architecture  gives  a  concrete  com¬ 
parison  of  dedicated  sensor/ effector  interfaces  to  the  communication  network  strategy. 

Figure  1  displays  the  layout  of  the  DSPM  network.  The  primary  node  (Bus  Controller)  is 
the  central  processor.  Each  other  node  services  one  or  a  small  collection  of  devices  that 
are  physically  very  close  together. 

2.  GENERAL  CHARACTERISTICS 

Messages  flow  exclusively  from  the  central  processor  to  the  nodes  and  from  the  nodes  to 
the  central  processor.  Message  routing  through  explicitly  activated  links  controls  the 
flow  of  information  through  the  net.  Messages  are  relayed  from  the  central  processor  and 
back  on  a  bit-by-bit  basis  without  respect  to  the  intended  recipient.  The  noden  route  the 
information  through  an  assigned  reconf igurable  conductivity.  The  Input/Output  (1/0)  port 
of  each  node  is  always  in  a  listening  mode  to  receive  any  incoming  messages.  Configura¬ 
tion  commands  from  the  central  processor  activate  particular  node  ports  to  transmit  mes¬ 
sages  either  back  to  the  central  processor  or  on  to  other  nodes  in  the  network. 

The  network  nodes  and  links  are  fully  duplex;  they  can  handle  incoming  and  outgoing  data 
simultaneously.  Tho  network  structure  configured  by  the  central  processor  creates  a 
tree-structured  bus  between  the  central  processor  and  the  nodes.  Because  of  the  multi¬ 
ple  paths  available  from  a  given  node  to  the  central  processor,  it  is  believed  that  this 
structure  will  be  more  reliable  and  tolerant  to  faults  and  physical  damage. 


The  initial  design  decisions  on  the  P°>PM  network  involved  the  central  processor.  The 
central  processor  is  the  most  important  element  to  the  operation  if  the  communication 
network.  The  need  for  high  reliability  requires  either  an  extremely  reliable  processor 
or  redundancy.  The  fact  that  the  general  reliability  cf  processors  has  net  reached  the 
desired  leve1  and  the  relatively  low  unit  cost  of  procetiors  led  us  to  use  triplicate 
central-processor  processing  elements. 

Experience  gained  on  the  triplex  F-8  DFBW  and  other  redundant  systems  has  shown  that 
fault-detection  at'1  fault  masking  are  greatly  enhanced  by  the  synchronous  operation  of 
redundant  systems.  Furthermore,  the  desire  to  mechanize  a  hardware  mechanism  for  the  data 
exchange/compare  between  processors  reinforced  this  decision  and  resulted  in  a  design  that 
uses  instruction- level  synchronization. 

The  next  ares,  of  co- cern  involved  isolation  between  the  central-processor  channels.  Each 
of  the  processors  in  the  cental  processor  simultaneously  executes  identical  code.  Each 
processor  though  has  unique  I/O  interfaces.  These  unique  I/O  interfaces  are  mechanizeo 
through  the  use  of  globally  unique  I/O  addresses  that  enable  each  processor  to  manipulate 
its  own  I/O  registers  without  impacting  on  the  other  processors'  registers,  i.e. ,  an 
instruction  that  addresses  an  occupied  address  in  Channel  A  actually  manipulates  thac 
register  while  the  same  instruction  in  Channels  B  and  C  is  a  vacant  operation. 

3 .  NETWORK  HARDWARE 

The  hardware  elements  which  comprise  the  communication  network  consist  of  ;.\  central  pro¬ 
cessor  Bus  Controller,  six  nodes,  and  the  interconnecting  links.  The  following  paragraphs 
describe  these  items. 


AP-101 

AP-101 

AP-101 

CHAN  A 

CHAN  B 

CHANC 

Figure  1.  Dispensed  sensor  processing  mesh. 


30-3 


3.1  Primary  Node — Bus  Controller 

The  central  intelligence  of  this  network  resides  in  the  software  algorithms  mechanized  in 
the  Bus  Controller.  These  algorithms  manage  both  the  simple  message  transfers  to  and 
from  the  various  nodes  and  the  critical  functions  relating  to  the  organization  and  main¬ 
tenance  of  the  network. 

The  Bus  Controller  must  configure  the  nodes  to  create  a  suitable  network  structure,  re¬ 
establish  this  structure  after  power  interrupts,  detect  failed  links  or  nodes,  and  either 
rebuild  or  repair  the  structure  to  circumvent  the  failure.  Also,  since  all  links  are  not 
in  simultaneous  use,  the  inactive  links  must  be  activated  periodically  and  used  to  re¬ 
place  other  acti /e  links  in  the  structure.  This  periodic  activation  maintains  up-to-date 
status  information  on  all  parts  of  the  network.  The  Bus  Controller's  hardware  and  soft¬ 
ware  design  must  incorporate  the  highest  reliability  and  fault  tolerance  obtainable  to 
execute  these  functions. 

Each  third  of  the  Bus  Controller  contains  a  microprocessor.  Random  Access/Programmable 
Read  Only  Memory  (RAM/PROM) ,  an  Oscillator,  Clock  Interval  Timers  (CITs) ,  a  Watchdog 
Timer,  I/O  interfaces,  an  Interprocessor  Communicator,  and  an  External  interrupt  Syn¬ 
chronizer.  (See  Figure  2.) 

Although  each  section  of  the  Bus  Controller  operates  independently,  they  are  synchronized 
with  each  other  at  the  instruction  level,  and  the  exchange/comparison  of  data  makes  the 
Bus  Controller's  redundancy  transparent  to  the  network. 

3.1.2  Bus  Controller  Processors 

The  primary  considerations  in  selecting  the  microprocessors  for  the  Bus  Controller  were 
a  16-bit  processor  and  a  large  instruction  repertory. 

Sixteen-bit  processor.  Experience  in  programming  various  computer  control  systems 
from  spacecraft  to  ground  support  equipment  has  shown  that  most  data  manipulation  and 
formatting  requires  a  minimum  of  16  bits  for  both  handling  ease  and  data  granularity. 


BUS CONTROLLER 

Figure  2,  Bus  Contr^xler 


304 


Large  instruction  repertory.  The  programming  task  eases  greatly  when  the  proces¬ 
sor  has  a  large  instruction  repertory.  It  is  much  easier  to  eliminate  certain  types  of 
instructions  from  the  programming  (i.e.,  exotic  addressing  modes)  than  to  work  with  un¬ 
mechanized  instructions. 

3.1.3  Bus  Controller  Memory 

The  Bus  Controller  memory  compliment  involved  two  main  considerations: 

(1)  Bits/vclume. 

(2)  Availability  of  units. 

The  use  of  the  erasable  memory  required  to  develop,  test,  and  modify  the  operational 
software  was  secondary  to  the  design  but  important  to  the  research  aspect  of  this  project. 


3.1.4  Bus  Controller  Clock 

Each  processor  has  its  own  dedicated  oscillator  as  a  clock  source.  The  decision  to 
operate  at  instruction-level  synchronization  between  processors  required  the  active  con¬ 
trol  of  these  oscillators.  The  incorporation  of  a  fault-tolerant,  phase-locked  clock 
design  developed  (and  subsequently  patented)  by  W.  M.  Daly  and  J.  F.  McKenna  of  CSDL 
under  NASA  sponsorship  met  this  requirement.  This  clock  has  a  nominal  frequency  of 
16  megahertz  with  a  phase  error  of  5  degrees  (Figure  3) .  The  design  is  such  that  the  loss 
of  any  one  oscillator  will  not  inhibit  the  phase  lock  between  the  remaining  oscillators. 

3.1.5  Interval  Timer 

The  time-cyclic  nature  of  many  operational  functions  in  an  avionics  system  and  the  gen¬ 
eral  need  for  an  interval  timer  resulted  in  the  incorporation  of  two  16-bit,  1-microsecond- 
per-bit  interval  timers.  Experience  has  shown  that  this  degree  of  mechanization  meets 
all  the  expected  uses  of  an  interval  timer. 


CONTROLLER 
A  REFERENCE 

CONTROLLER 
B  REFERENCE 

CONTROLLER 
C  REFERENCE 


BUS  CONTROLLER  -  FAULT-TOLERANT  CLOCK 
Figure  3.  Bus  Controller:  fault-tolerant  clock. 


3.1.6  Watchdog  Timer 

A  Watchdog  Timer,  which  requires  periodic  servicing  by  system  software,  is  an  excellent  de¬ 
tector  of  correct  software  program  execution. 

3.1.7  DFBW  Computer  I/O 

The  DFBW  computer  to  Bus  Controller  I/O  interface  is  unique  in  that  the  triplex  Bus 
Controller  processors  act  as  a  single  processor  but  must  interface  to  the  DFBW  triplex 
computers  as  separate  units.  This  is  due  to  the  asynchronous  relationship  between  the 
Bus  Controller  and  DFBW  system  and  the  DFBW  system  frame  synchronization  versus  the 
Bus  Controller  instruction-level  synchronization. 


30-5 


The  complexity  of  this  interface  is  increased  further  because  the  Bus  Controller  is 
considered  an  external  device  by  the  DFBW  computer,  and  the  external  device  must  respond 
with  the  correct  handshaking  protocol  during  I/O  execution  and  initiate  buffered  I/O 
execution.  In  fact,  this  interface  must  function  for  both  direct  output,  which  is  a 
DFBW  computer  macro  instruction  execution,  and  buffered  I/O,  which  is  device  initiated 
and  executes  in  the  DFBW  computer  on  a  cycle-steal  basis  transparent  to  software  pro¬ 
gram  execution.  The  direct  output  execution  time  is  approximately  10  microseconds  to 
transfer  two  16-bit  words  across  this  interface,  whereas  the  buffered  I/O  execution  time 
is  approximately  15  microseconds  per  16-bit  word  transferred  across  the  same  interface. 

In  order  to  meet  these  diverse  requirements,  this  interface  is  mechanized  with  two  first- 
in,  first-out  (FIFO)  buffers;  control  logic;  and  a  FIFO  status  register  between  each  DFBW 
computer  and  Bus  Controller  processor  (Figure  4).  Communication  between  this  interface 
and  the  Bus  Controller  processor  is  on  an  interrupt  basis  for  information  flow  from  the 
DFBW  computer  to  the  Bus  Controller  and  on  program  control  for  information  flow  from  the 
Bus  Controller  to  the  DFBW  computer. 

3.1.8  Network  I/O  Interface 

Three  ports  on  the  Bus  Controller  connect  to  the  communication  network.  Bach  port  is 
uniquely  associated  with  a  processor. 

The  interposition  of  a  bus  between  the  controller  processors  and  the  three  ports  such 
that  any  processor  could  address  any  port  would  require  either  I/O  serialization  be¬ 
tween  processors  and  ports  or  the  interconnection  of  a  minimum  of  17  lines  between  each 
processor  and  each  port.  The  need  to  interconnect  this  number  of  wires  would  create  a 
packaging  problem  and  a  possible  multiple  source  of  failures  that  could  affect  reli¬ 
ability.  Serialization  of  this  processor /port  interconnect  would  impose  a  transmission¬ 
time  penalty  on  I/O.  Therefore,  each  processor  is  uniquely  associated  with  an  I/O  port, 
and  the  use  of  globally  unique  addresses  for  I/O  registers  allows  a  single  processor  to 
manipulate  a  port  without  impact  on  other  ports. 

Each  port  is  a  1 -megahertz  serial  connection  into  the  communication  network.  This 
serial  connection  could  be  mechanized  with  a  number  of  different  protocols.  The  proto¬ 
cols  examined  were:  Synchronous  Data  Link  Control  (SDLC) ,  Asynchronous  Digital  Data 
Link,  and  MIL-STD-1553.  Although  the  first  two  of  these  methods  would  be  easier  to 
mechanize  from  both  an  operational  and  a  hardware  point  of  view,  the  MIL-STD-1553  was 
chosen  for  its  compatibility  with  a  wide  range  of  presently  operating  avionics  subsys¬ 
tems  (Figure  5) .  1 

The  1553  interface  actually  mechanized  in  the  Bus  Controller  and  each  node  is  modified 
to  conform  to  the  network  operating  characteristics.  Communications  between  the  Bub 


AP-101 

OUTPUT 

BUS 


AP-101 

CONTROL 

LINES 


AP-101 

INPUT 

BUS 


68000 

DATA 

BUS 


BUS  CONTROLLER  -  AP-101  INTERFACE 
Figure  4.  Bus  Controller;  AP-101  interface. 


30-6 


ENABLE 


BUS  CONTROLLER  -  NETWORK  INTERFACE 
Figure  5.  Bub  Controller;  network  interface. 


Controller  and  the  addressed  node  are  received  and  retransmitted  by  each  node  between 
the  Bus  Controller  and  the  addressed  node.  Transmission  of  a  standard  1553  pulse-code- 
modulated  Manchester  code  would  suffer  both  distortion  and  time  delay  by  this  receive/ 
transmit  function.  Therefore,  the  1553  interlace  in  the  Bus  Controller  and  each  node 
is  modified  such  that  the  Manchester  code  is  converted  to  a  self-clocking  pulse-width- 
modultated  code  for  transmission  on  the  interconnnecting  link,  converted  back  to  Man¬ 
chester  code  within  each  node  for  decode,  and  reconverted  to  self-clocking  pulse-width- 
modulated  code  for  retransmission  (Figures  5  and  6).  Thus,  the  actual  1553  interface 
can  exist  between  the  node  and  its  atttached  device  but  not  between  the  Bus  Controller 
and  the  device. 


NODE -LINK  OUTPUT 


Figure  6(a).  Node;  link  input. 


Figure  6(b).  Node;  link  output. 


3.2  Xnterprocessor  Communicator 

The  exchange  and  comparison  of  all  I/O  data  between  processors  is  essential  to  failure 
detection  and  fault  masking  in  a  multiprocessor  controller.  Various  combinations  of 
software  and  hardware  logic  can  be  chosen  as  the  method  through  which  the  exchange/ 
comparison  of  data  can  be  mechanized.  For  the  DSPM,  the  design  uses  all  hardware  logrc, 
and  the  Interprocesaor  Communicator  is  the  mechanism  that  executes  this  function  (Fig¬ 
ures  7  and  8) . 


TRANSMIT  REG  1 


INTERPROCESSOR  COMMUNICATOR 


Figure  7,  Interprocessor  communicator. 


FROM 

B 


FROM 

C 


TO 
CHANNEL 
B 


TO 
•  CHANNEL 
C 


INTERPROCESSOR  COMMUNICATOR 

Figure  8.  Interprocossor  communicator 


30-8 


The  InterproceBsor  Communicator  is  used  to  distribute  data  from  one  channel  to  all  other 
channels  and  to  exchange  data  simultaneously  from  all  channels  with  comparison,  error 
flagging,  and  correction.  The  data  exchange  mechanism  only  functions  correctly  between 
synchronized  channels.  An  attempt  to  exchange  data  between  unsynchronized  channels  will 
generally  cause  a  loss  of  the  transmitted  information  (the  transmitted  information  is 
not  correctly  received  by  anyone  except  the  transmitter  himself) .  An  attempt  to  accept 
a  data  value  from  an  unsynchronized  channel  produces  an  indeterminant  result,  most  often 
a  zero.  Table  1  summarizes  the  functional  registers  of  the  Interprocessor  Communicator. 


Table  1.  Interprocessor  communicator  registers. 


Transmitter  Registers 

Name 

Function 

XV 

Transmit  a  single  data  item  from  all  channels.  The  received  vlue 
will  be  the  bit-by-bit  2-of-3  majority  function.  A  bit  fault  will 
set  the  corresponding  bit  of  the  error  latch. 

XI 

Transmit  a  single  data  item  from  Channel  1  to  all  channels.  The 
received  value  in  all  channels  will  be  that  value.  The  error  latch 
status  is  unaffected. 

X2 

Transmit  a  single  data  item  from  Channel  2  to  all  channels.  The 
received  value  in  all  channels  will  be  that  v^lue.  The  error  latch 
status  is  unaffected. 

X3 

Transmit  a  single  data  item  from  Channel  3  to  all  channels.  The  re¬ 
received  value  in  all  channels  will  be  that  value.  The  error  latch 
status  is  unaffected. 

Receive  Register 

XR 

The  received  result  of  a  transmitted  operation  can  be  retrieved 
from  this  register.  The  register  is  read/write  and  can  be  directly 
loaded  by  a  store  into  the  register. 

Status  Register 

XE 

A  status  register  that  contains  4  bits. 

Bit  0  is  an  error  bit  corresponding  to  Channel  1. 

Bit  1  is  an  error  bit  corresponding  to  Channel  2. 

Bit  2  is  an  error  tit  corresponding  to  Channel  3. 

Bits  3-14  are  always  zero. 

Bit  15  is  a  summary  bit  that  is  set  if  any  of  the  bits  0  thru  2  are 
set  and  is  clear  if  they  are  all  clear. 

The  register  bits  are  set  if  a  bit  fault  is  noted  for  the  correspond¬ 
ing  channel  during  an  XV,  operation. 

The  register  is  read/write  and  can  also  be  directly  loaded  by  a  store 
to  the  register.  Only  bits  0  thru  2  (and  by  function  bit  15)  can  be 
altered  by  a  store  to  the  register. 

A  store  to  a  transmit  register  and  a  subsequent  fetch  from  the  receive  register,  XR, 
exchanges  data  between  channels.  The  transmit  register  selection  determines  the  char¬ 
acter  of  the  exchange.  All  channels  must  execute  the  store  to  the  exchange  register  in 
synchronism.  The  error  latch,  XE,  is  set  if  errors  are  detected  during  XV  exchanges. 
Once  an  exchanqe  has  been  initiated  by  a  store  to  an  exchange  register,  the  processor 
proceeds  with  normal  instruction  execution.  While  a  transfer  is  in  progress,  a  referenci 
to  the  exchange  mechanism  suspends  the  processor  instruction  execution  until  completion 
of  the  exchange.  An  exchange  operation  takes  3  microseconds.  An  immediate  reference  to 
the  receive  register,  XR,  after  a  store  is  thus  likely  to  suspend  operation  of  the  pro¬ 
cessor  for  about  1  microsecond  to  allow  time  to  complete  the  exchange.  Since  the  inter¬ 
lock  mechanism  is  automatic  and  the  maximum  loss  of  processor  time  per  ex-hange  is  neg¬ 
ligible,  the  programmer  need  not  concern  himself  with  either  assuring  completion  of  the 
exchange  before  accessing  the  received  data  or  padding  the  program  between  exchange  re¬ 
quests  and  subsequent  receiver  register,  XR,  accessing. 

This  design  was  chosen  because  the  exchange/comparison  of  data  takes  minimal  execution 
time,  requires  instruction-level  synchronization  between  processors,  and  avoids  the  need 
to  establish  criteria  for  agreement  between  two  nearly  identical  results  Agreement  is, 
by  definition,  bit-for-bit. 


30-9 


Through  the  use  of  globally  unique  I/O  addresses,  the  various  Interprocessor  Communicator 
transmitter  registers  are  mapped  to  different  memory  locations  for  each  Bus  Controller 
processor.  Thus,  only  Channel  1  can  manipulate  the  XI  transmit  register  while  the  execu¬ 
tion  of  an  identical  instruction  in  Channels  2  and  3  is  a  vacant  operation.  However, 
the  instruction  execution  sets  the  address  latch  in  the  communicator  which  then  passes 
the  correct  output  to  the  receiver. 

When  all  processors  simultaneously  address  the  XV  transmitter,  a  bit-for-bit  comparison 
of  the  inserted  data  is  performed.  The  voted  results  are  passed  to  the  receiver,  XR, 
and  XE  records  any  discrepancy.  The  processors  can  then  use  the  XE  information  for  fault 
detection. 

3.3  External  Interrupt  Synchronizer 

The  instruction- level  synchronization  between  processors  imposed  the  requirement  that  all  pro¬ 
cessors  process  external  interrupts  simultaneously  even  though  the  external  device  gen¬ 
erating  the  interrupt  was  associated  with  only  one  processor. 


Devices  requiring  service  signal  the  event  through  the  generation  of  an  interrupt.  Since 
these  interrupts  can  occur  at  random  times  in  each  channel,  and,  if  processed  randomly 
would  destroy  the  instruction  level  synchronization  between  processors,  an  Interrupt  Handler 
is  positioned  between  the  devices  and  the  processor  (Figure  9) . 


INT  1  SYNC 
INTI 

INT  2 
INT  3 


Figure  9.  Interrupt  handler. 


The  function  of  the  Interrupt  Handler  is  to  record  the  identity  of  th '  interrupting  de¬ 
vice  and  transmit  the  fact  that  an  interrupt  is  pending  to  the  Interrupt  Handlers  associ¬ 
ated  with  the  other  processors.  When  all  Interrupt  Handlers  have  been  notified,  an  in¬ 
terrupt  is  simultaneously  applied  to  each  processor.  The  interrupt  that  actually  triggers 
interrupt  processing  in  the  processors  is  tagge  1  as  originating  with  some  actual  channel 
■  1  "'ice  (i.e.,  interrupts  from  Channel  1  devices  will,  through  the  Interrupt  Handler, 
generate  a  level  1  interrupt,  while  Charnels  2  and  3  will  generate  level  2  and  level  3 
interrupts,  respectively).  The  interrupt  processing  routine  that  begins  execution  will, 
therefore,  only  address  the  actual  device  in  the  processor  responsible  to  service  that 
device.  In  its  simplest  form,  each  processor  has  three  routines  to  service  a  device, 
and  the  particular  routine  executed  depends  upon  the  interrupt  level  seen  by  the  processor. 

Use  of  the  Interprocesscr  Communicator  makes  the  singularity  of  the  processor/device  re¬ 
lationship  transparent  to  the  processors.  The  routine  that  executes  in  the  processors 
allows  only  one  processor  to  manipulate  the  device  register,  but  all  processors  receive  the 
data  from  the  actual  interrupting  device  through  the  Interprocessor  Communicator  which  has 
an  addressed  transmit  register  that  is  keyed  to  the  interrupt  level. 

The  identity  of  the  actual  interrupting  device  is  suppled  to  the  processor  in  an  Inter¬ 
rupt  Status  Word  extracted  from  the  Interrupt  Handler  when  interrupt  processing  begins. 

The  various  channel  devices  will  set  a  bit  in  this  word  to  indicate  service  required. 

This  bit  is  cleared  when  the  device  has  been  serviced.  Figure  10  is  an  example  of  the 
software  use  of  the  Interprocessor  Communicator  in  response  to  inputB  from  the  External 
Interrupt  Handler. 


4.  NETWORK  MESSAGES 

The  information  flow  in  the  network  consists  of  Command  messages  ani  Response  messages. 
Command  messages  are  at  least  two  words  long  and  consist  of  a  command  word  and  one  or  more 
data  words.  Response  messages  are  either  a  single  status  word  or  a  number  of  data  words. 

The  Bus  Controller  transmits  Command  messages  into  the  network  of  the  form  shown  in  Fig¬ 
ure  11  and  Table  2.  Command  messages  have  two  formats:  Node  directed  messages  and  Device 
directed  messages. 


30*10 


4.1  Node  Directed  Messages 

Node  directed  messages  consist  of  a  Command  word  that  identifies  the  message  and  a  data 
word  that  defines  the  node  ports  enable/disable  configuration. 

4.2  Device  Directed  Messages 

Device  directed  messages  consist  of  a  Command  word  that  identifies  the  message  and  one 
>r  more  data  words.  The  most  significant  bits  (MSB)  of  the  first  data  word  following  the 
command  word  signifies  whether  it  is  a  true  data  word  for  the  device  or  a  word  that  de¬ 
fines  some  action  the  device  is  to  perform. 


Figure  10.  Interrupt  controller  (interrupt  levels) 


30-11 


BIT  TIMES 


1 

3 

3 

4 

r 

e 

3 

8 

0 

10 

„]7 

13 

14 

18 

18 

0 

1 - 

LI 

IB 

3 

COMMAND  WORD 


SVNC 


REMOTE  TERMINAL 
ADDRESS 


T/R 


SUBADDRESS/ 

MODE 


DATA  WORD 
COUNT/MODE  CODE 


, _ | 

16 

1 

DATA  WORD 

rz 1 

1  SYNC 

DATA 

P 

figure  11.  Word  formats. 


Table  2.  Command  message  format. 


Command  Messages 

Command  Word 

Remote  Terminal  Address 

Definition 

11111 

Message  to  Node 

00000 

Reserved 

00001 

Message  to  Device  1 

00010 

Message  to  Device  2 

etc . 

Subaddrefes/Mode 

NNNNN 

Individual  Node  Address 

Data  Word  Count/Mode  Code 

00001 

Number  of  Data  Words  Fol- 

00010 

lowing  Commaid  Word 

etc . 

T/R  Bit 

Always  set  to  0 

Data  Word 

1 XXXXXXXXXXXAAAA 

Node  Port  Enable/Disable 

ODDDDDDDDDDDDDDD 

Data  or  Command  to  Device 

0  0  0  ODDDDDDDDDDDD 

Device  Data 

0111CCCCCCCCCCCC 

Device  Command 

4.3  No  le  Response  Message 

The  Response  Message  issued  by  a  node  to  acknowledge  the  receipt  of  a  Node  Directed  Mes¬ 
sage  consists  of  a  single  status  word  as  defined  in  Figure  11.  This  response  is  auto¬ 
matically  executed  by  the  node  hardware  logic. 


30-12 


4.4  Device  Response  Message 

The  Response  Message  issued  by  a  device  to  acknowledge  the  receipt  of  a  Device  Directed 
Message  is  device  dependent  and  message  dependent.  The  response  might  be  a  single  status 
word  signifying  message  received,  or  it  could  be  a  number  of  data  words  in  response  to 
a  data  request. 

4 . 5  Intramessage  Word  Gap 

The  time  gap  between  the  words  that  compose  the  Command  Messages  is  on  the  order  of  5  micro¬ 
seconds,  while  the  gap  between  multiple  words  in  a  Response  Message  is  device  dependent. 

5 .  NODE  HARDWARE 

Each  node  is  logically  and  physically  divided  into  two  sections — a  Communication  section 
and  a  Device  Servicer.  The  Communin.\tion  section  executes  ail  functions  that  relate  to 
the  activation/deactivatian  of  I/O  ports,  receipt  and  retransmission  of  all  messages,  and 
recognition  of  messages  directed  to  its  attached  device  (Figure  lk) . 


TRANSMITTER 


Figure  12.  Node  detail. 


5.1  Node  Communication  Section 

The  desire  for  minimal  time-loss  impact  on  the  receipt  and  retransmission  of  messages 
and  on  the  execution/response  to  port  configuration  commands  dictated  the  design  of  this 
section.  Therefore,  each  node  has  a  dual  message- flow  path.  One  path  receives  and  re¬ 
transmits  all  messages  without  any  examination,  and  the  second  simultaneously  examines  each 
message  for  the  identity  of  its  appropriate  recipient. 

Node  configuration  commands  are  executed  by  hardware  logic  within  the  Communication 
Section,  and  an  immediate  response  to  receipt  of  command  is  transmitted  to  the  Bus  Con¬ 
troller.  Messages  directed  to  the  attached  device  are  passed  to  the  Device  Servicer 
section  which  transfers  the  message  to  the  device.  In  this  case,  the  response  is  not 
necessarily  immediate,  as  it  is  a  received-message  function. 

5.2  Device  Servicer  Section 

The  Device  Servicer  Section  consists  of  a  FIFO  buffer,  receivers,  and  control  logic  be¬ 
tween  the  node  bus  and  the  attached  device.  This  design  enables  the  node  to  recognize 
messages  directed  to  the  attached  device,  temporarily  store  them,  and  then  transfer  the 
message  to  the  device  at  the  device  acceptance  rate.  Therefore,  the  device  does  not  need 
to  process  messages  at  the  network  data  rate.  Data  flow  from  the  device  to  the  node  for 
transmission  to  the  Bus  Controller  is  executed  at  the  device  rate. 


5.3 


Attached  Device 


Devices  that  are  attached  to  nodes  can  have  many  forms  and  functions.  Figure  13  defines 
one  of  the  devices  used  in  the  DSPM  network. 


Figure  13.  Device  servicer. 


5.3.1  Node  Processor 

The  node  Communication  section  is  designed  to  use  hardware  logic  while  executing  all  of 
its  functions.  In  order  to  facilitate  both  the  expansion  of  node  capability  and  node 
test,  a  microprocessor  with  associated  memory  was  inserted  into  the  design.  This  proces¬ 
sor  interfaces  directly  to  the  node  bus,  and  therefore,  it  can  be  used  to  execute  more 
detailed  types  of  node  configuration  commands  and/or  introduce  controlled  typos  of  fail¬ 
ures  into  the  network  structure,  i.e.,  failed  link,  babbling  node,  scrambled  messages, 

6 .  DSPM  SOFTWARE 

The  Bus  Controller  and  network  structure  function  directly  influenced  software  design  and 
development.  This  function  is  the  collection  and  dispatch  of  data  between  sensors,  DFBW 
computers,  and  aircraft  effectors.  Essentially,  the  Bus  Controller  is  an  interrupt-driven 
processor  that  executes  tasks  directed  by  the  DFBW  computers. 

6.1  Software  Structure 

The  software  structure  is  a  list  of  event-dependent  tasks.  Either  an  interrupt  occurrence 
or  a  fault  detection  signals  an  event.  Each  task  (or  response  to  an  event)  has  a  priority 
that  determines  its  sequence  of  execution.  Network  fault-detection  t.asxs  have  the  highest 
priority  since  no  other  task  can  be  accomplished'  while  the  network  is  not  functional. 

6.2  Coda  Generation 

The  use  of  a  general  programming  language  (PASCAL)  was  investigatad  early  in  the  software 
development  phase.  A  number  of  routines  were  coded  in  PASCAL  and  the  assembly-level  lan¬ 
guage.  The  memory  space  and  associated  execution  time  required  by  PASCAL-generatod  code 
was  too  inefficient  and  slow;  therefore,  all  coding  was  performed  in  assembly-level  language. 

£.3  Support  Software 

Support  software  for  the  selected  microprocessor,  and  most  microprocessors ,  suffers  from  a 
prime  deficiency.  The  program  must  be  assembled  and  the  image  produced  from  a  single  file. 
This  creates  problems  in  code  modulization  and  the  use  of  mnemonics.  Individual  programmers 
must  be  careful  in  their  selection  and  use  of  mnemonics  in  order  to  prevent  dual  use  and 
library  overflow.  Also,  module  boundaries  lose  their  identity. 


30-14 


The  solution  to  these  support  software  deficiencies  was  to  divide  the  program  into  sections, 
each  of  which  starts  at  some  absolute  address.  Then,  the  program  sections  were  linked  to¬ 
gether  through  a  common  data  section  that  contains  "jump  tables"  and  other  necessary  linkage 
devices  for  intermodule  communications. 

7.  CONCLUSION 

A  network  structure's  effectiveness  in  maintaining  communications  between  the  vir  out  ele¬ 
ments  of  a  total  aircraft  avionics  system  can  only  be  demonstrated  by  actually  building  and 
testing  such  a  structure.  This  structure  must  be  exposed  to  the  most  comprehensive  possible, 
test  matrix  of  simulated  physical  and  fault  damages.  The  structure's  response  to  these 
imposed  failures  must  be  measured  for  both  the  level  and  the  speed  of  response.  The  degree 
of  data  loss  during  communication,  the  data  bandwidth,  and  the  duty  cycle  must  be  measured 
under  various  operating  conditions.  Only  when  a  comprehensive  data  base,  established 
through  the  operation  of  such  a  communication  network,  has  been  compiled  can  an  intelli¬ 
gent  decision  be  made  on  the  effectiveness  of  this  communication  strategy. 


BIBLIOGRAPHY 


1.  Cattel,  J.J.,  and  Kemp,  A.M. ,  Damage  and  Fault-Tolerant  Network  Incorporation 
Into  F8  Digital  Fly-Bv-Wire  System,  CSDL  Report  R- 1 3  0  9 ,  Cambridge ,  Mass . , 

August  1979. 

2.  Hopkins,  A.L.,  and  Smith,  T.B.,  "OSIPIS-A  Distributed  Fault-Tolerant  Control 
System",  Digest,  14th  IEEE  Computer  Society  Int.  Conf.,  San  Francisco,  Calif., 
March  1977. 

3 .  McKenna ,  J . F . ,  Demonstration  and  Evaluation  of  a  Paul t-Tolerant  Input/Output 
Network ,  CSDL  Report  R-918,  Cambridge,  Mass.,  Septeml  or  1975. 

4.  Smith,  T.B.,  A  Highly  Modular  Fault-Tolerant  Computer  System,  Ph.D.  Dissertation, 
Dept,  of  Aeronautics  and  Astronautics,  MIT,  Cambridge,  Mass.,  November  1973. 

5.  Smith  T.B.,  "A  Damagis-and-Fault-Toierant  Input/Output  Network”,  IEEE  Transactions 
on  Computers,  Vol,  C-24,  No.  5,  May  1975. 

6.  Szalai,  K.J.,  and  Megna,  V.A.,  "Development  of  a  Multicomputer  Fault-Tolerant 
Digital  Fly-By-Wire  System",  Third  USA- Japan  Computer  Conference,  San  Francisco, 
Calif.,  October  1978. 


NEXT  GENERATION  MILITARY  AIRCRAFT  WILL  REQUIRE 
HIERARCHICAL/MULTILEVEL  INFORMATION  TRANSFER  SYSTEMS 


3 '  - 1 


Juan  W.  McCuen 
Hughes  Aircraft  Company 
Fullerton,  California,  U.S.A. 

tp  81-ie-a 


ABSTRACT 


Changes  In  avionic  subsystems  and  miaslon  roles  of  next  generation  aircraft  will  require  new  concepts  in  data 
transfer.  New  aircraft  will  need  total  airframe/weapon  system  integration  which  means  new  approaches  must  be 
developed  for  the  interconnection  of  avionic  subsystems.  Effort  has  begun  to  develop  a  Military  Standard  (MIL-STD) 
which  will  define  the  requirements  for  a  high  speed  data  bus  network.  The  SAE/A-3K  Subcommittee  on  Multiplexing 
has  aocepted  the  task  of  developing  this  MIL-STD.  The  standard  shall  oharacterlee  a  higher  order  Information  Trans¬ 
fer  System  (ITS)  that  will  interconnect  avionic  systems,,  that  contain  their  own  multiplex  ITS,  into  a  fully  Integrated 
data  oomplex.  The  higher  order  ITS  shall  employ  an  operational  protocol  that  a  ill  provide  subsystems  and  common 
sensors,  independence  and  fault  Isolation  by  distributed  control  of  the  common  data  bus. 

This  paper  presents  an  overview  of  the  functions  to  be  considered  in  developing  the  standard.  Several  ITS  architec¬ 
tural  otmflgu rations  are  presented  to  show  bus  network  topology  and  data  transfe  r  path  requirements  at  the  aubeyatem 
black  box  levol  and  at  the  alrcraft/mlasion  level. 

INTRODUCTION 

Future  advanceir  Jnta  In  aircraft  basic  Tight  and  weapon  subsystems  accompanied  by  the  ni  for  total  avionics  system 
Integration  will  demand  changes  in  both  lntra  and  Inter  subsystem  data  transfer  an  we  know  it  today.  These  changes 
are  due  to  many  factors,  some  at  whioh  are: 

e  Need  to  ul'.mlnate  costly  hardware/software  elements  required  of  centralized  controlled,  data  transfer 
systems. 

a  Dispersion  of  microprocessors  within  subsystems  necessitating  the  intercha  nge  of  processed  data  between 
subsystems. 

a  Need  for  the  generation  of  t  >  aircraft  data  base,  available  to  all  subsystems,  which  includes  all  airframe/ 
mission  parameters. 

•  Maximizing  the  use  of  uommon  sensor  data. 

a  Making  maximum  use  of  multifunctional  Conticl/Display  (C/D)  elements, 
a  Maximising  the  use  common  sensor  data. 

a  Making  m  odmum  ua  of  multifunctional  coni.rol/Dlsplay  (C/D)  elements. 

e  Allowance  for  further  standardization  of  hai’dware/softwarc  elements  by  use  of  other  MIL-STr"*,  e.  g. ,  1563. 
1689,  1750,  and  1760  for  interchangeability  between  weapon  systems  and  aircraft. 

Present  day  military  aircraft  employ  only  single  level-centralised  controlled,  command  .response  type,  information 
transfer  systems.  Those  aircraft  with  multiple  ITS  which  require  Interchange  of  data,  communioate  with  one 
another  vis  mutually  supported,  memory  storage  interface  units.  With  subsystems  integrated  in  thia  manner,  a 
change  In  one  results  in  unpleasant  ripple  affects  progressing  throughout  the  others.  This  action  is  due  to  the  changes 
necessary  in  the  centralised  command/reaponse  software  packages  in  each  of  the  supporting  ITSs. 

A  solution  to  this  problem  is  the  development  and  use  of  an  ITS  which  will  efficiently  Interconnect  in  a  hierarchical 
order,  multilevel  multiplexed  ITSs.  With  such  an  approach  a  higher  order  operating  system  (Mission  Management) 
can  be  created  which  provides  the  processing  of  functions  required  of  multi  subsystem  inputs.  Such  a  high  speed 
Higher  Order  Transfer  (HOX)  system,  employing  contention  protoco’  will  provide  each  lower  level  ITS  a  functionally 
Isolated  communication  medium  whenever  data  interchange  is  required. 

The  extensive  use  of  MIL-STD- 1553  bus  networks  has  proven  the  concept  of  multiplexed  lata  transfer  systems  to 
achlevs  m  degree  of  Integration.  Unfortunately  1553B  protocol  doea  not  provide  tha  characteristics  (speed  and  proto¬ 
col)  needed  to  efficiently  operate  with  future  hierarchical/multilevel  networks.  MIL-STD- 1553B  characteristics  are 
ideally  matched  to  many  intra  avionics  subsystem  data  transfer  requirement?  which  necessitates  sensor  data  collec¬ 
tion,  central  processing,  then  distribution  of  results  to  peripheral  areas,  e.g. ,  Electrical  Power  Control,  Flight 
Control,  Propulsion  and  Stores  Management  subsystems.  There  will  be  continued  use  of  1553B  bus  networks  for 
intra  subsystem  data  transfer. 

In  the  next  decade  we  can  expect  some  well  knwn  subsystems  to  be  oombined  and  the  appearance  of  new  ores.  Each 
major  subsystem  will  have  its  own  Intra  multiplexed  bus  network.  All  these  asynchronous  operating  ITS  will  need  to 
be  Interconnected  to  ornate  an  integrated  data  base.  Such  a  data  base  will  maximize  use  of  common  date  and  allow 
for  continuing  changes  in  the  subsystems  and  total  airframe/weapon  tasks  with  minimum  disturbance  to  the  higher 
order  ITS.  Characteristics  of  the  HOX  system  muri  provide  for  isolation  of  flight  orttical  functions,  allow  for  inde¬ 
pendent  design,  production  and  test  of  subsystems  and  incorporate  distributed  bus  control,  to  eliminate  costly  central 
hardware/software  complexes. 


310 


HIERARCHICAL  ITS  ARCHITECTURE 


Th«  composition  of  an  avionic  .ubey.tem  auite  and  tha  aircraft',  mlaaton  rol.  will  determine  the  configuration  of  tho 
Wtfi.r  ordar  ITS  that  ahould  ba  uaad.  Example,  of  hierarohloal/multllevel  ITS  oonflgurattona  are  ehown in 

"E*™  difl*r?nt-.1’??  topol°®’'  F1*ul*  S1_1  «hcrw.  this  architectural  thinking  by  incorpor- 

atlng  three  level.  <rf  ITS  to  Interconnect  the  full  complement  of  avionic  aubayeteme  and  unite  ahown.  The. a  level., 
being  identified  aa  bua  network.,  operate  under  their  own  control  Independent  of  other  bua  network.. 


Figure  31-1.  Three  Level  Information  Transfer  System  (ITS)  with  Dual-Higher  Outer  Transfer  Networks 


(USD 


Figure  31-2,  Two  Level  Information  Transfer  System  (ITS)  with  Dual-Higher  Order  Ir.insfer  Networks 


l**071  _  12*07-2 


31-3 


Operation  of  the  multilevel  networks  of  Figure  31-1  is  illustrated  by  the  transfer  of  data  from  a  lower  network  to  the 
highest.  The  Navigation  <NAV)  subsystem,  incorporating  as  an  lntra-1558B  bus  network  (Level  1),  interfaces  with 
the  other  two  avionic  subsystems  and  the  Mission  Computer  via  the  Avionics  Bus  Network  (Level  2) ,  using  contention 
protocol.  At  this  higher  level,  processing  can  be  accomplished  on  data  functions  common  to  all  three  avionic  sub¬ 
systems.  At  the  third  and  highest  level  (Level  3)  the  Mission  Computer  interfaces  with  the  remaltilng  subsystems 
and  common  sensors  via  either  the  Global  or  Common  SenBor  buses  using  contention  protocol.  At  this  highest  level 
data  processing  can  take  place  that  could  involve  any  combination  of  subsystem  functions.  Note  that  the  two  flight 
critical  subsystems  (Flight  Control  and  Propulsion)  employ  a  common  bus  (Flight  Critical),  employing  contention 
protocol  as  their  means  of  Integration.  The  Flight  Control  Processor  provides  the  Interface  to  the  two  higher  order 
buses.  There  is  a  significant  difference  in  the  ITS  configuration  of  the  two  subsystems.  Wherein  the  Propulsion  sub¬ 
system  contains  Its  own  1553B  MUX  bus  network,  the  Flight  Control  subsystem  employs  the  Flight  Critical  Bus  as  its 
intra  subsystem  network  including  the  Propulsion  Control  Processor. 

Such  a  topology  provides  functional  isolation  between  the  two  flight  critical  subsystems  and  to  all  other  subsystems 
and  common  sensors,  A  point  of  interest  is  that  if  a  signal  function  originating  in  any  of  the  three  major  ovionic  sub¬ 
systems  (NAV,  COMM, ,  FLT-INSTR.)  is  required  in  the  propulsion  subsystem,  it  must  be  processed  through  five  (5) 
bus  networks. 

Figure  31-1  presents  a  hierarchical  structure  ITS  with  two  HOX  bus  networks  (Global  and  Common  Sensoi  Buses) 
operating  at  the  highest  level,  each  with  a  specific  assignment,  The  Global  Bus  used  for  interconnection  of  major 
subsystem  and  the  Common  Sensor  Bus  providing  common  sensor  data  to  these  same  subsystems  in  a  broadcast  oper¬ 
ating  mode.  These  HOX  networks  operate  with  contention  protocol  and  p-ovide  the  functional  Isolation  bo  Important 
to  flight  critical  subsystems. 

Figure  31-2  presents  another  ITS  configuration  which  incorporates  only  two  levels  of  bus  networks.  It  Is  comprised 
of  two  higher  order  bus  networks,  Incorporating  contention  protocol,  supporting  their  functionally  related  subsystem. 
Each  subsystem  (excluding  the  ICS-Command/Control)  incorporates  its  own  1553B  bus  network.  One  HOX  network. 
Identified  as  the  Flight  Safety  Bus  provides  data  integration  between  those  subsystems  and  sensor/units  involved  in 
basic  flight  of  the  aircraft.  The  Mission  Bus  provides  the  integration  medium  between  mission/weapon  type  subsys¬ 
tems  which  have  no  direct  function  with  basic  flight. 

Data/signal  functions  originating  in  the  Mission  Bus  subsystems,  required  by  the  Flight  Control  and  Propulsion  sys¬ 
tems,  to  assist  In  implementing  mission/weapon  tasks,  are  transferred  through  the  Mission  Management  Processor 
(MMP) .  The  MMP  either  processes  and/or  provides  the  storage  for  data  transferred  directly  between  the  two  HOX 
networks.  The  MMP  broadcasts  its  data  to  the  respective  bus,  like  other  subsystems  on  the  buses,  which  it  is 
serving.  The  Common  Sensors  have  data  paths  to  both  HOX  networks  as  do  the  Control/Display  (C/D)  subsystem(s) 
to  provide  maximum  redundancy  for  both  basic  flight  safety  and  mission  functions. 

Figure  31-3  Illustrates  the  data  flow  through  the  hlerarcblcal/multilevel  bus  network  of  Figure  31-2  that  could  occur 
with  the  detection  of  r  hostile  missile  track  in  the  Defensive  Weapon  Subsystem.  Such  detection,  with  appropriate 
aircraft/mission  action,  would  require  data  transfer  and  processing  at  various  ITS  levels  and  Involve  many  subsys¬ 
tems.  The  most  critical  would  Involve  the  transfer  of  date  and  command  functions  to  the  Flight  Control  and  Propul¬ 
sion  subsystems  to  affecl  aircraft  evasive  action. 


MILSTO-15S3B  MIU  STD  1553  i:U5FS  MIL-STD-1553B 


Figure  31-3.  illustration  of  Data  Flow  Between  Hierarchical/Multilevel  Bus  Networks 


12607-3 


31-4 


Worthy  of  note  is  the  postulated  appearance  of  new  aufaeyatema  and  the  functional  combining  of  others.  Such  aircraft/ 
syiitem  functions  as  Fire  Control,  Stores  Management,  A  MAC,  Weapon  Guidance,  and  Release  have  been  combined 
into  one  subsystem  which  oontalns  its  own  1SS3B  bus  network.  A  new  subsystem  identified  as  the  ICS/Command 
Control  subsystem  was  created  to  handle  digital  voice,  audio  control  functions,  and  facilitate  Integration  with  the 
JT1DS  digital  audio  channels.  This  subsystem  will  provide  voice  synthesis,  and  the  all  important  voice  recognition/ 
command  function,  to  improve  the  crerv  member's  effectiveness  by  reducing  his  manual  workload.  This  subsystem, 
unlike  the  malorlty  of  the  others,  would  uot  incorporate  a  1553B  type  ITS  since  its  data  transfer  characteristics  are 
not  compatible  with  centralised  command/response  protocol  but  are  with  the  contention  protocol  of  a  HOX  system. 

Two  subsystems  are  identified  as  Control/Display  (C/Di  in  Figure  3-2.  These  subsystems  consolidate  all  those  C/D 
functions  that  have  previously  been  dedicated  to  the  various  avionic  subsystems.  Whether  two  are  required  depends 
on  the  size  and  mission  tasks  of  the  aircraft.  The  design  concept  and  functional  operational  configuration  of  the  C/D 
area  is  complicated  because  the  cross-discipUnary  engineering  requirements  are  presently  divided  between  many 
organisations,  i.e. ,  Human  Factors,  Flight  Dynamics,  Avionics,  Propulsion,  etc.  One  can  foresee  the  need  of 
single  command/control  selections  via  pushbuttons,  e.  g. ,  the  selection  of  a  required  attack  mode  In  the  Fire  Control 
System  which  would  result  in  an  associated  autopik  c  response,  preparation  of  appropriate  weapons,  selection  at  a 
Radar  System,  selection  of  HUD  symbology,  selection  of  other  displays,  report  via  JTIDS,  and  etc.  Further  investi¬ 
gation  will  reveal  whether  C/D  subsystems  can  be  serviced  by  Intra-1SS3B  bus  networks  with  special  video  switches 
and  symbol  generators  or  require  a  high  speed  ITS  Incorporating  possible  FM/FM  operation  on  the  bus  or  with  the 
bus  accepting  processed/compressed  digital  video. 

An  imposing  task  even  now  confronting  avionic  system  integration  is  the  requirement  that  new  aircraft  integrate 
Flight  Control  and  Fire  Control  subsystems  to  improve  weapon  delivery  and  gun  laying  accuracy.  This  integration 
task  is  complicated  by  the  need  of  an  integrated  Interactive  Propulsion/Flight  Control  System.  One  wonders  how 
many  cooks  are  going  to  be  stirring  this  pot.  The  complications  of  the  aforementioned  task  can  be  simplified  by 
employing  a  higher  order  ITS  to  interconnect  these  critical  avionic  subsystems.  '  .ch  an  ITS  incorporating  contention 
type  protocol,  can  preserve  subjystem  Integrity,  provide  the  avionics  subsystem  manufacturers  independence  in  sub¬ 
system  development,  but  most  importantly,  provide  the  means  for  Integration. 

Figure  31-4  Is  presented  to  show  the  physical  and  functional  Impact,  a  supposedly  minor  change  made  in  the  physical 
configuration  of  the  bus  network,  can  have  on  subsystem  functional  isolation,  hardware  minimization,  dependence  on 
state-of-the-art  technology,  etc.  The  connection  of  the  two- HOX  configuration  (Figure  31-2)  into  a  single  HOX  net 
work  (Figure  31-4)  is  certainly  feasible,  functionally.  The  single  HOX  network  provides  certain  advantages  over  the 
dual  configuration  but  also  adds  certain  disadvantages.  In-depth  investigation  would  be  required  to  determine  If  the 
advantages  gained  in  the  single  network,  primarily  in  elimination  of  hardware,  would  offset  possible  negative  features 
of  leas  isolation  between  flight  and  mission  subsystems  and  less  data  path  redundancy. 


31-5 


A  significant  saving  could  be  obtained,  using  the  single  HOX  network  by  the  elimination  ot  hardware  necessary  to 
inte'-'ice  common  subsystems  and  sensors  to  the  dual  HOX  networks.  Also,  the  Mission  Management  Processor 
(Mb..  )  no  longer  need  act  as  the  transfer  and  storage  agent  for  the  Interchange  of  data  between  die  two  HOX  networks. 
The  state  of  technology  within  the  time  frame  of  hardware  Implementation  must  be  considered.  The  numbers  shown 
in  the  squares  at  each  bus-tap  connection  in  Figures  31-2  and  31-4  gives  the  quantity  of  tap /drops  required  of  each 
subsystem  and  sensor/unit  per  a  bus  network.  In  the  dual  HOX  network  configuration  of  figure  31-2  a  single  bits  in 
the  Flight  Safety  Bus  network  has  26  tap/drope  while  a  single  bus  in  the  mission  bus  network  has  28.  Fiber  optics 
could  not  be  used  as  the  bus  medium  today.  The  single  HOX  network  (Figure  31-4)  has  42  tape/drope  per  bus.  Could 
such  a  bus  make  use  of  fiber  optics,  not  unless  there  is  a  major  breakthrough. 

AVIONIC  SUBSYSTEMS /ITS  CONFIGURATIONS 

Each  future  avloulcs  subsystem  will  have  special  data  processing  and  transfer  characteristics  that  will  require  certain 
intra-  and  inter-subsystem  bus  network  configurations  and  protocol  characteristics.  Accepting  the  premise  that  each 
subsystem  will  contain  Its  own  multiplexed  ITS  and  Incorporate  a  data  transfer  Interface  to  a  HOX  system,  then  each 
subsystem's  physical  and  functional  characteristics  must  be  evaluated  for  data  rates,  accuracy,  redundancy,  isolation 
fault  tolerance,  etc. ,  before  the  optimum  lntra-  tnter-ITS  configuration  can  be  determined. 

Figure  31-5,  an  expanded  version  of  Figure  31-2,  illustrates  various  subsystem  lntra-bus  networks  and  weapon 
system  functions  associated  with  each  major  subsystem.  Figure  31-5  also  presents  a  more  In-depth  picture  illustra¬ 
ting  the  many  dlCerent  MIL-STD-1553B  bus  network  configurations  that  can  be  adapted  to  the  particular  physical/ 
functional  requirements  of  the  subsystem,  remembering  that  there  can  be  many  network  configurations  developed, 
based  on  the  ground  rules  and  weighing  factors  selected.  It  Is  beyond  the  scope  of  this  paper  to  present  the  reasoning 
for  the  choice  of  the  different  subsystems  Intra-inter- network  configurations  -  although  a  Flight  Control  subsystem 
configuration  is  presented  as  an  example  of  how  subsystem  characteristics  and  requirements  can  determine  a  selected 
ITS  configuration  and  how  the  subsystem  may  use  the  HOX  network  to  simplify  its  lntra  data-transfer  task. 


W092T 


31-6 


Figure  31-6  deplete  a  multiplexed,  fly-by-wtre,  Flight  Control  System  (FCS)  -  lte  lntra/inter  bus  network  configura¬ 
tion  the  result  of  tradeoffs  that  will  be  discussed.  The  FCS  shown  employs  triple  redundant  16E3B  data  patb/bus 
and  Interfaces  with  the  two  HOX  networks,  via  the  Flight  Control  Processors  which  Integrates  the  FCS  functions  *o 
the  alrcraft/mtsslce  functions.  The  FCS  Incorporates  cross-strapping  of  triple  redundant  sensors/transducers  to  the 
three,  single  channel  MIL-STD-1563B  buses.  Cross-strapping  provides  each  Flight  Control  Processor  (FCP)  all 
Inputs  via  Its  dedicated  single  channel  Bus  Controller  <BC),  bus,  and  Remote  Terminals  (RT).  The  cams  strapping 
technique  eliminates  system  hardware.  For  the  FCP  to  receive  all  inputs  from  the  triple  redundant  sensors,  without 
cross  strapping,  the  FCPs  would  require  three  BCs  for  connection  to  the  three  buses  and  each  RT  would  also  require 
three  BIUs.  This  approach  would  probably  meet  the  isolation/fault  criteria  required  of  a  multiplexed  fly-by- wire 
FCS,  and  it  certainly  reduces  the  hardwsre  complement  significantly,  but  one  critical  data  path  function  Is  missing. 
Interchange  of  data  between  the  FCPo,  for  voting  and  redundancy  criteria,  ie  not  possible  in  the  Intra-bus  network 
shown  because  It  lacks  intertie  of  the  FCPs.  This  problem  Is  solved  free  of  charge  by  the  HOX  bus  network  which 
provides  each  FCP  s  redundant  data  path  between  FCPs  -  free  of  charge  because  the  FCPs  must  have  this  connection 
to  the  HOX  buses  for  alrcraft/mlsslon  functions.  Each  FCP  through  Its  BIU  can,  by  contention  protocol,  broadcast 
on  the  HOX  buses  that  data  required  of  the  other  two  FCPs.  This  subsystem  configuration  Is  given  as  an  example  of 
how  multiplexed  bus  networks,  operating  at  different  data  rates  and  protocol,  can  be  lntertied  In  a  manner  that  are 
complementary  and  the  characteristics  of  each,  evert  In  a  hierarchical/multilevel  ITS,  can  provide  functions  not 
attainable  in  a  single  level  bus  network. 


HOX  SYSTEM  MI  L-8TD- DEVELOPMENT 

A  Task  Group  (TG)  formed  within  tha  SAE/A-2K  Subcommittee  on  Multiplexing,  has  accepted  the  task  of  generating 
a  MID-STD  for  a  Higher  Order  Transfer,  Information  Transfer  System.  The  TG  will  need  help,  especially  Mslsta-ce 
and  information  from  various  agencies  and  organisations  that  are  responsible  for: 

e  Flight  dynamic 

e  Avionic  systems 

•  Propulsion/power  generation  and  distribution 
e  Human  factors  and  resources 
e  Standardization 

Basic  Information  needed  will  include: 

e  Intra-and  inter- subsystem  data  bases  covering  aircraft/mission  subsystems  configuration  for  various 
aircraft  sizes  and  mission  roles, 

e  Processing  tasks /requirements  of  the  Mission  Managrmt.it  Processor(s). 


12607-6 


31-7 


With  the  above  described  information,  the  TG  can  conduct  trade- off/analyaia  covering  such  HOX  ITS  areas  as: 

•  Operational  protocol-involving  various  addressing  schemes,  data  flow  paths,  variable  transfer  speeds 
and  message  lengths,  etc. 

•  Bus  network  and  unit  characteristics  -  involving  bus  length,  number  of  bus  tap/drops  and  bus/unit 
electrical  characteristics. 

•  Bus  network  intertie  and  topology  —  involving  means  of  providing  the  required  level  of  unit/bus 
redundancy  and  fault  isolation,  plus  the  functionai/physlcal  Intertie  of  hierarchical  bus  networks. 

«  Forecast  of  available  technology. 

It  took  ten  years  to  develop  MIL-STD-I553S.  Its  formulation  was  greatly  assisted  by  the  parallel  development  and 
use  of  various  types  of  airborne  multiplexed  systems  during  the  same  period.  The  generation  of  a  HG.C  ITS  MIL- STD 
will  be  more  difficult  since  there  will  be  little  development  of  hlt.archical  ITS  within  the  period  while  generating  the 
MIL- STD.  A  significant  factor  making  the  effort  difficult  will  Involve  the  ever  changing  functions  of  subsystems  and 
the  creation  of  new  ones.  The  progress  of  the  TG  and  the  qualify  of  the  MIL-STD  will  depend  upon  the  quality  of 
information  presented  to  it  by  the  many  military  organisations  whose  task  is  to  give  direction  and  identify  needs,  and 
the  Industrial  companies  that  manufacture  the  hardware/software  elements  respectively.  Otherwise  the  MIL-STD 
will  be  a  defacto  standard  generated  by  the  TG. 

First  action  of  the  SAE/A-2K,  in  developing  the  HOS  ITS  MIL-STD,  was  the  conduction  of  an  open-house  meeting  at 
WPAFB,  Ohio,  USA  on  11  March  1980.  Thirty-three  (33)  people  attended  the  meeting  and  expressed  their  views  on 
the  MIL-STD  development  program.  Each  person  attending  was  requested  to  submit  what  they  believed  to  be  the 
basic  requirements  of  the  HOX  system  including  desired  characteristics.  Next  meeting  will  be  held  in  conjunction 
with  the  National  Aerospace  and  Electronics  Conference  (NAECON)  in  May  of  this  year. 

CONCLUSIONS 

A  new  Higher  Order  Transfer  (HOX)  type  system  (defined  by  a  MIL-STD)  is  needed  as  the  medium  to  Interconnect 
avionic  subsystems  for  total  weapon  system  integration  on  next  generation  aircraft.  MIL-STD- 1653  data  bus  systems 
have  given  us  a  good  start  in  the  art  of  multiplexing  and  they  Bhould  continue  to  be  used  to  the  maximum  extent 
possible  to  provide  subsystems  with  their  intra-data  transfer  requirements.  There  Is  the  need  to  tie  all  these  Inde¬ 
pendent  operating  subsystem  bus  networks  together,  and  this  can  be  accomplished  by  higher  order  systems  operating 
between  1  and  50  MHz  ( possibly  variable)  with  a  contention  type  protocol.  Effort  Is  already  underway  by  a  Task 
Group  sponsored  by  the  SAE/A-2K  Subcommittee  on  Multiplexing,  to  generate  a  MIL-STD  covering  such  as  ITS. 

Information  is  requested  by  the  TG  covering  new  avionic  subsystem  functions/configuration  requirements  and  associ¬ 
ated  data  base  lists.  We  must  remember  that  the  TG  is  comp,  iced  of  volunteers  and  that  its  accomplishments  will  be 
determined  by  the  support  it  receives , 


GLOSSARY 

AMAC 

Aircraft  Monitor  and  Control 

ATARS 

Automatic  Traffic  Advir  ory  and  Resolution  Sys' 

BC 

Bus  Controller 

BCAS 

Beacon  Collision  Avoidance  System 

BIT 

Built  In  Test 

BIU 

Bus  Interface  Unit 

C/D 

Control  Display 

DABS 

Discrete  Address  Beacon  System 

DoD 

Department  of  Defense 

FCP 

Flight  Control  Pneessor 

FCS 

Flight  Control  System 

FM 

Frequency  Modulation 

HOX 

Higher  Order  Transfer 

HUD 

Heads  Up  Display 

ICS 

Intercommunication  Subsystem 

ITS 

Information  Transfer  System 

JTID8 

Joint  Tactical  Information  Distribution  System 

MLS 

Mlcrowavs  Landing  System 

MMP 

Mission  Management  Processor 

31-8 


WPAFB 


Remote  Terminal 
Society  of  Automotive  Engineers 
Stores  Management  System 
Task  Group 

Wright  Patterson  Air  Force  Base 
Weather  Radar 


REFERENCES 

Bain,  J.  M. ,  1978,  'The  Impact  of  Fiber  Optic  Multiplexing  on  Distributed  Avionics  Architecture,'  1978,  Data  Bus 
Conference,  ASD-TR-78-84. 

Betts,  R. ,  1980,  '50  MBPS  Fiber  Optics  Data  Bus,'  SAE/A-2K  Presentation/Paper,  IBM. 

Gross,  J.  P. ,  Broadhead,  S.  L. ,  Moore,  J.  D. ,  1980,  'IMUX:  High  Speed  Communication  Bus, '  SAE/A-2K, 
Presentation/Paper,  S.C.I. ,  Inc/Unlverslty  of  Alabama. 

Husbands,  C.  R. ,  1979,  'Airborne  Integrated  Communication  System,'  3rd  Digital  AvionlcB  Systems  Conference. 

Smith,  L.  A.,  Crossgrove,  W.  A.,  Dervey,  D.  E.,  'Advanced  Avionic  Systems  for  Multimission  Applications,' 
AFAWE  Report  F38615-77-C-1252,  Boeing  Military  Airplane  Company. 

3 

Swaney,  R.  E. ,  1980,  'C  I  Data  Bus, '  IR&D  Report,  Hughes  Aircraft  Company. 

Whiting,  J.  H. ,  1979,  'Military  Aircraft  Avionics  In  the  1980's,'  Standardization  In  Military  Avionics  System 
Architecture  Symposium,  WPAFB,  Ohio. 

Metcalfe,  P.  M. ,  Boggs,  D.  R. ,  'Ethernet:  Distributed  Pocket  Switching  for  Local  Computer  Networks,' 
Communications  of  the  ACM,  July  1976. 


S6-I 


DISCUSSIONS 
riSSION  VI 

REFERENCE  NO.  OF  PAPER;  VI  . 

OISCUSSOR'S  NAME:  Erwin  -,tng'iT  ,^FB,  USA 
AUTHOR'S  NAME:  I.  Molr  (p.  Duke,  presenter) 

COMMENT:  You  seem  to  imply  that  MIL-ST0-1760  does  not  totally  satisfy  your  requirements  and  Is  too 
complex.  Since  this  standard  Is  In  the  final  coordination  cycle  do  you  have  any  additional  Inputs? 
Concerns? 

AUTHOR'S  REPLY:  British  Aerospace-Brough  had  a  number  r,f  com, rents  to  make  on  MIL-STD-1760.  Our 
comments  were  sent  to  Smiths  Industries  who  were  tasked  with  collating  a  UK  Industry  response.  This 
rf'-.u"se  wjs  duely  sent  to  the  'JS  but  British  Aerospace-Brough  imve  had  no  further  Information  on  the 
state  of  i/60.  In  view  of  the  serious  Implications  of  some  of  our  comments,  I  would  be  very  Interested 
to  see  a  revised  version  of  this  standard,  hopefully  before  It  Is  "frozen." 


REFERENCE  NO.  OF  PAPER:  VI-28 
DISCUSSOR'S  NAME:  COR  Strada,  USN 
AUTHOR'S  NAME:  Molr  (P.  Cuke) 

COMMENT:  How  do  you  Intend  to  accomplish  "tuning  the  system  to  pilot  capabilities"?  How  would  you 
handle  the  calibration  constants/initial  conditions  that  the  weapon-aiming  system  needs  to  know  about 
the  weapon?  These  constants  may  vary  from  aircraft  to  aircraft  as  well  as  weapon  to  weapon. 

AUTHOR'S  REPLY:  (1)  The  pilot's  capability  to  Interact  with  the  aircraft  system  was  seen  from  the 
outset  to  be  very  important.  To  give  a  brief  history,  the  Rig  has  grown  from  two  activities  ct  British 
Aerospace  Brough.  The  first  was  research  Into  data  bus  systems  from  an  avionic  point  of  view  and  the 
second  was  the  development  of  an  advanced  cockpit  for  a  single-seat  tactical  combat  aircraft.  The 
latter  has  been  used  to  perform  ergonomic  assessments  of  new  control  and  display  concepts.  Extensive 
"outside  world"  and  data  analysis  features  have  been  added  to  provide  a  complete  tool  for  the 
assessment  of  future  systems.  "Tuning"  the  system  will  be  achieved  by  modifying  the  cockpit  and 
avionic  systems.  The  avionic  systems  will  Initially  be  simulations,  building  up  through  emulations  to 
"real"  hardwa-e  and  software.  Hence,  modification  during  the  early  development  of  the  system  will  be 
relatively  cheap. 


(2)  Navigation  parameters  will  probably  be  Input  to  the  aircraft  using  a  portable 
on-board  data  source.  The  extension  of  this  to  weapon  data  Is  controversial  and  requires  study.  In 
the  absence  of  a  method  for  modifying  the  constants,  I  would  suggest  more  use  of  the  role  change 
philosophy.  Certain  LRUs  which  contain  preset  data  should  be  easily  replaced.  In  general,  the  rule  Is 
that  the  most  feasible  system  will  contain  the  minimum  hardware  and  software  dedicated  to  a  specific 
weapon  type. 


REFERENCE  NO.  OF  PAPER:  VI -28 
OISCUSSOR'S  NAME:  J.  F.  Ferrer) ,  OASSAULT 
AUTHOR'S  NAME:  Molr  (P.  A.  Ouke) 

COMMENT:  Comnent  comptez  vous  r&oudre  le  problfcme  des  Interfaces  analoglques  et.  en  partlculler  les 
slgnaux  dlscrets  de  trl  compte  tenu  de  la  standardisation  que  vous  souhaltez  ootenlr. 

How  do  you  Intend  to  solve  the  problem  of  analog  Interfaces  and  In  particular  the  use  of  discrete 
signals  taking  account  of  the  standardization  you  are  looking  for? 

AUTHOR'S  REPLY:  This  depends  on  the  aircraft/launcher/weapon  Interface.  However,  for  conventional 
weapons  the  analogue  and  discrete  signals  would  be  generated  within  a  Pylon  Interface  Unit  by  D  to  A 
conversion  or  switching  of  power  supplies.  Only  power  supplies  and  digital  data  would  be  Input  to  the 
Pylon  Interface  Unit. 


REFERENCE  NO.  OF  PAPER:  VI -29 
OISCUSSOR'S  NAME:  Schoelch,  IABG 
AUTHOR'S  NAME:  HEGER 

COMMENT:  How  do  you  achieve  tolerance  against  Interruptions  of  the  fiber  optic  bus? 

AUTHOR'S  REPLY:  In  normal  ring  operation  one  transmission  direction  Is  used  (out  of  two  possible).  In 
this  operation  mode  several  messages  can  be  conveyed  simultaneously  using  the  principle  ODD,  the 
messages  run  from  the  source  station  till  the  destination  station  where  are  absorbed.  On-line  segments 


S6-2 


where  no  message  transmission  takes  place  special  delimiter  symbols  are  transmitted.  In  case  tf  live 
Interruption  (broken  transmission  medium  In  both  directions  or  faulty  station)  this  Is  detected  by  the 
adjacent  station  by  means  of  „  time-out.  This  station  Initializes  the  reconfiguration  procedure;  this 
leads  to  two  stations  which  recognize  themselves  ns  being  adjacent  to  the  Interruption.  These  two 
stations  reverse  the  transmission  direction  alternatively  and  periodically  by  means  of  special 
broadcast  messages.  Messages  having  certain  destination  addresses  are  only  transmitted  within  the 
respective  period  with  the  adequate  transmission  direction.  By  reversing  procedure  no  messages  are 
lost  but  the  message  flow  Is  only  chcoped  and  so  the  average  throughput  Is  not  affected  In  the  main, 
but  the  average  transmission  times  become  longer.  When  an  additional  Interruption  happens,  the  same 
procedure  as  described  Is  performed.  When  a  station  at  the  end  of  the  physical  line  receives  bits  from 
the  so  far  Interrupted  line  (again)  the  new  or  repaired  part  of  the  line  with  Its  stations  Is 
Identified  and  coupled  (again).  And  finally,  when  all  Interruptions  of  the  ring  bus  are  closed  the 
system  reinstalls  the  ring  structure  with  on  arbitrary  transmission  direction. 

In  the  case  of  the  Interruption  of  only  one  of  the  two  transmission  directions  the  other  direction 
Is  selected  and  full  performance  Is  guaranteed. 

Besides  Interruptions  direct  effects  are  also  detected,  and  the  respective  line  reconfiguration 
takes  place  as  well. 

And  finally  It  must  be  pointed  out  that  all  changes  of  the  configurations  are  reported  system-wide 
by  means  of  broadcast  status  reporting  messages  and  so  the  line  and  system  status  Is  displayed  on  the 
display  of  the  master  control  panel  In  order  to  be  able  to  Inform  the  repair  personnel  effectively. 


REFERENCE  NO.  OF  PAPER:  VI -30 
DISCUSSOR'S  NAME:  G.  H.  Hunt,  RAE 
AUTHOR'S  NAME:  Megna 

COMMENT:  In  your  paper  you  mention  the  objective  of  comparing  the  dispersed  sensor  mesh  system  with 
the  existing  dedicated  system  already  developed  for  the  F-8  fly-by-wire  aircraft.  Could  you  state 
whether  this  comoarlson  has  yet  been  made  and  give  an  Indication  of  the  results  obtained. 

AUTHC  REPLY :  An  F-8  Iron  bird  facility  was  used  which  has  the  flight  system  as  a  part  of  It— the 
neixorK  Is  In  parallel  with  that.  The  comparison  Is  made  based  upon  examination  of  the  functions 
performed  by  the  two  different  Implementations. 


REFERENCE  NO.  OF  PAPER:  VI-30 

DISCUSSOR'S  NAME:  Alan  Stern,  Boeing  Co.,  USA 

AUTHOR'S  NAME:  V.  Megna 

COMMENT:  You  seem  to  be  attempting  to  solve  the  problem  of  Interfacing  sensors  and  actuators  to  flight 
control  system  -  iters  using  a  flight  safety  reliable  bus.  Why  Isn't  a  redundant  1S53B  approach  good 
•roucr  i  this  ’  What  Is  the  advantage  of  your  approach  relative  to  1553B? 

AUTHOR'S  REPL.:  There  are  a  number  of  bus  problems  which  we  hope  to  avoid  through  the  use  of  a 
network.  One  is  physical  damage  which  results  In  loss  of  communication  to  units  beyond  the  break, 
another  is  the  problem  of  some  subsystem  disabling  the  bus  by  constantly  transmitting  on  the  bus.  But, 
basically  we  are  Investigating  the  use  of  «  network  concept  to  establish  a  data  base  upon  which  to  make 
decisions  as  to  the  best  method  for  Interconnecting  avionic  subsystems. 


REFERENCE  NO.  OF  i-nPER:  VI -30 
DISCUSSOR'S  NAME:  Horst  Klster,  VDO 
AUTHOR'S  NAME:  V.  Megna 

COWENT:  The  controller  Is  the  most  critical  part  of  the  system  In  respect  to  safety.  If  each  of  the 
terminals  were  able  to  disconnect  the  bus  from  Itself  without  any  handover  mechanism  (not  like  dynamic 
bus  control,  but  "automatically"),  would  that  help? 

AUTHOR'S  REPLY:  Distributing  bus  control  complicates  the  problem.  Central  bus  control  makes  for  less 
complexity,  but  requires  extra  reliability  consideration.  Complexity  comes  about  when  control  Is 
passed. 


S6-3 


REFERENCE  NO.  OF  PAPER:  VI -30 
DISCUSSOR'S  NAME:  1  orst  Klster,  VOO 
AUTHOR 1 S  NAME:  V.  Megna 

COMMENT:  The  controller  Is  the  most  critical  part  of  the  system  In  respect  to  safety.  If  each  of  the 
terminals  were  able  to  disconnect  the  bus  from  Itself  without  any  handover  mechanise  (not  like  dynamic 
bus  control,  but  "automatically"),  would  that  help? 

AUTHOR'S  REPLY:  Distributing  bus  control  complicates  tlie  problem.  Central  bus  control  makes  for  less 
complexity,  but  requires  extra  reliability  consideration.  Complexity  comes  abnit  when  control  Is 
passed. 


REFERENCE  NO.  OF  PAPER:  VI  30 
DISCUSSOR'S  NAME:  K.  Br aimer,  ESG 
AUTHOR'S  NAME:  V.  Megna 

COMMENT:  The  coiwunlcatlon  network  shown  by  you  has  6  nodes.  If  they  were  all  mutually  connected  to 
each  other,  every  node  would  have  5  ports  to  the  other  nodes  and  there  would  be  a  total  number  of  15 
links  (l.e.,  N(ol)/2  links  with  N*6).  You  did  not  choose  this  maximum  configuration,  but  some 
configuration  between  the  maximum  and  the  minimum  possible.  Is  there  a  particular  reason  for  the 
selection  of  your  configuration?  Has  It  to  do  with  a  specified  degree  of  redundancy  (e.g.,for  flight 
safety)  or  did  you  design  ^or  a  specified  cut  set  (the  minimum  number  of  link  failures  that  cause  the 
net  to  split  Into  two  separate  parts)? 

AUTHOR'S  REPLY:  Every  node  has  A  ports  and  the  controller  has  4  ports.  Kith  this  configuration  you  do 
not  end  up  w<th  unconnected  ports— all  will  be  used.  Otherwise,  you  end  up  with  a  port  that  cannot  be 
connected  to  anything  else. 


REFERENCE  NO.  OF  PAPER:  VI -30 
DISCUSSOR'S  N'ME:  E.  Gangl ,  WPAFB,  USA 
AUTHOR'S  'JIME:  Megna 

COHrttNT:  You  mentioned  In  your  discussion  that  MIL-STD-i553  LSi  hardware  was  not  available  and  also 
that  there  was  no  standardization  guidance  on  fiber  optic  bussing.  I  would  like  to  mention  that  the  US 
Is  considering  a  fiber  optic  version  of  1553  (probably  will  be  MIL-STD-1773)  for  publication.  Also 
that  LSI  1553  terminal  hardware  will  be  available  this  year  from  several  sources.  In  the  US  from 
Harris  Corp.,  Collins,  Grumman,  Circuit  Technology,  Inc.,  and  In  the  UK  from  Smiths  and  Marconi 
Electronic  Devices. 

AUTHOR'S  REPLY:  As  I  mentioned  In  ny  presentation,  at  the  Ime  that  we  were  designing  and  constructing 
our  system  MIL-STD-1553  LSI  chips  were  not  available  and  therefore,  they  were  not  Included.  This  does 
not  preclude  the  addition  of  1553  LSI  chips  when  they  are  available.  As  far  as  a  specification  for 
1553  fiber  optics  Is  concerned,  the  system  which  we  have  built  Is  an  engineering  model  to  test  out  our 
network  concepts.  If  we  do  go  on  to  build  a  flight  system,  we  will  take  In  consideration  any  fiber 
optic  standard  which  exists  at  that  time. 

REFERENCE  NO.  OF  PAPER:  VI -31 

DISCUSSOR'S  NAME:  Alan  Stern,  Boeing  Co.,  USA 

AUTHOR'S  NAME:  J.  McCuen 

C0W1ENT:  Because  It  Is  desirable  to  reduce  the  number  of  buses  and  Interfaces  to  a  minimum,  and 
because  MIL  STD  1553  Is  strongly  encouraged  evev  within  flight  control  systems;  It  Is  desirable  to 
provide  a  1553  mode  which  Is  a  "contention  scheme."  This  would  prevent  the  need  for  additional  bus 
redundancy  management  In  flight  safety  critical  systems. 

AUTHOR'S  REPLY:  Identify  a  contention  Information  Transfer  System  (ITS)  as  one  wherein  each  Remote 
Terminal  (RT)  has  the  means  of  acquiring  tlie  bus  under  Its  own  control.  There  Is  no  central  control 
needed  or  allowed  as  In  a  1553  system. 

Since  a  1553  system  has  central  control,  even  operating  1r.  a  "dynamic  bus  control"  It  cannot  ever 

operate  as  a  pure  contention  system.  Even  a  1553  system  operating  In  a  dynamic  control  mode  must  keep 

handing  off  control  from  RT  to  RT  In  a  set  sequence.  The  U.S.  tri-services  have  determined  there  will 
be  no  change  In  15538,  l.e.,  no  1553C  allowed. 

Also,  a  1553  bus  cannot  provide  the  functional  Isolation  from  other  subsystem  (RT)  failures  as  can 

a  contention  bus  system.  Bus  management  In  a  contention-type  system  would  be  minimal. 


SI  FI  -  AN  Ul.TRA-RELIABLE  AVIONIC  COMPUTING  SYSTEM 

Kurt  Moses 
Bendix  Corporation 
Flight  Systems  Division 
Teterboro,  New  Jersey,  USA 


SUMMARY 

SIFT  (Software  Implemented  Fault  Tolerance;  is  an  ultra-reliable  computing  system  that  is 
designed  for  flight-critical  control  an  I  avionics  applications.  A  typical  application 
would  be  a  fly-by-wire  control  system  tor  civil  or  military  aircraft.  SIFT  is  based  on 
a  multi-processor  architecture  that  achieves  fault  tolerance  by  replicating  computing 
tasks  among  processing  units.  Error  detection  and  system  configuration  are  performed  by 
software  to  maintain  the  operational  integrity  of  the  computing  system.  SIFT  has  been 
designed  tc  meet,  a  system  failure  probability  goal  of  10~“  per  hour. 

SIFT  operation  requires  a  high  speed  inter-computer  communication  system.  This  is  real¬ 
ized  by  dedicated  serial  links  arrayed  in  a  star  connection,  i.e.  every  processor  broad¬ 
casts  to  and  receives  data  from  all  other  processors  in  the  complex.  Care  has  been  taken 
that  no  delay  due  to  contention  for  ports,  buses  and  processors,  limits  system  operation. 

Computing  is  carried  out  by  high  speed,  16  bit  Bendix  930  processors,  which  have  a  through¬ 
put  of  approximately  800  KOPS  based  on  an  appropriate  flight  control  instruction  mix. 

Each  processor  has  a  32K  memory  associated  with  it. 

Software  algorithms  are  used  for  failure  detection  by  means  of  voting,  failure  isolation 
to  the  faulty  processor,  and  reconfiguration  after  fault  detection.  Frame  synchronization 
between  processors  is  employed  to  reduce  data  skew  and  minimize  false  alarms. 

This  paper  Uescr4hes  the  architecture  of  SIFT,  its  hardware  implementation,  and  the  unique 
test  stand  used  for  evaluation.  Potential  applications  of  this  technique  to  current  and 
anticipated  ultra-reliable  electrical  flight  control  systems  are  given. 

The  work  presented  in  this  paper  was  done  by  Bendix  Flight  Systems  Division  for  SRI 
International  under  NASA  contract  number  NAS1-15428.  This  work  Is  being  sponsored  by 
NASA  Langley  Research  Center. 

1.  INTRODUCTION 

Automatic  flight  control  systems  which  once  provided  mainly  pilot-relief  functions,  have 
in  recent  years  taken  on  flight-critical  tasks,  i.e.  tasks  whose  successful  accomplish¬ 
ment  is  vital  to  the  safety  of  the  aircraft.  Automatic  landing  under  low  visibility/ 
ceiling  conditions  was  one  of  the  first  of  these  flight-critical  tasks  to  be  imposed  on 
the  AFCS.  More  recently,  "fly-by-wire"  (electrical)  control  systems  have  taken  the  place 
of  conventional  mechanical  controls,  and  the  Control-Configured  Vehicle  (CCV)  which 
achieves  the  desired  flying  characteristics  at  least  partly  by  means  of  electronic  con¬ 
trols,  rather  than  solely  by  aerodynamic  configuration,  has  made  its  debut.  The  desire 
to  reduce  fuel  consumption  has  given  a  powerful  impetus  to  the  use  of  electrical  flight 
controls,  since  this  permits  the  unaugmented  aircraft  to  be  designed  in  a  minimum  drag 
configuration.  The  vehicle  then  achieves  satisfactory  flying  qualities  through  the  use 
of  the  electrical  flight  control  system.  Mechanical  controls  then  become  superfluous  in 
such  a  vehicle  since,  without  the  electrical  system,  the  vehicle  is  uncontrollable  (un- 
flyable).  Obviously,  a  flight  control  system  entrusted  with  such  tasks  must  be  ultra¬ 
reliable,  i.e.  its  reliability  must  be  of  the  order  of  the  basic  aircraft  structure. 

These  considerations  l. 've  led  to  the  development  of  SIFT,  which  achieves  the  failure  prob¬ 
ability  of  10“l°  per  hour  through  the  use  of  software-implemented  fault  tolerance  tech¬ 
niques  and  hardware  redundancy. 

While  primarily  motivated  by  the  requirements  of  flight  control  and  related  flight- 
critical  applications  (e.g.  flutter  control,  engine  fuel  control  etc.),  SIFT  can  be  used 
in  the  context  of  the  total  avionic  system  for  both  flight-critical  and  non-critical 
tasks  to  achieve  an  overall  avionic  system  that  may  be  more  economical  than  the  present 
accumulation  of  separately  designed  LRU’s.  These  often  cannot  even  communicate  with  each 
other,  much  less  substitute  for  one  another.  With  SIFT,  it  is  possible  to  substitute  a 
failed  processor  that  was  performing  a  critical  task  with  one  that  is  performing  a  less 
critical  task,  and  processor  inter-communications  are  handled  in  a  routine  manner.  SIFT 
is  a  multi  failure-survivable ,  multi-processor  computer  array  that  utilizes  dedicated 
ports  and  busses  for  all  interprocessor  data  transmissions  so  that  there  are  no  major 
delays  due  to  contention.  All  fault  detection  and  reconfiguration  algorithms  are  imple¬ 
mented  in  software. 

Each  processor  communicates  with  the  other  processors  over  bit-serial  busses  by  broad¬ 
casting  its  computed  data.  This  data  is  validated  by  means  of,  3  or  3  fold  voting,  with 
presumably  identical  data  broadcast  by  the  other  processors.  Voting  is  done  exclusively 
by  software.  Majority  voting  is  most  effective  if  the  values  subjected  to  vote  are 
identical  except  for  errors.  The  computed  results  can  only  be  expected  to  be  identical 
if  the  programs  receive  identical  inputs.  This  in  turn  requires  some  degree  of  synchro¬ 
nization  between  processors  and  further,  requires  a  basic  strategy  to  insure  input  data 


32-2 


consistency.  These  requirements  impose  a  large  interproc.essor  commun 4  cation  load  on  the 
bus  system  which  could  lead  to  unacceptable  delays  due  to  contention  for  bus  or  data  port 
access  in  multiplexed  busses. 

External  I/O  information  is  transferred  by  MIL-STD-1553A  serial  links.  Time  division 
multiplex  controllers  govern  the  data  flow  to  and  from  aircraft  actuators  and  sensors. 
There  is  one  controller  for  each  processor  and  1I553A  bus.  Each  1553A  controller  and  bus 
can  support  up  to  32  remote  terminals  with  associated  actuators  or  sensors. 

The  SIFT  hardware  design,  build,  and  test  effort  was  the  responsibility  of  Bendix  Flight 
Systems  Division,  under  contract  to  SRI,  International  who  is  the  prime  contractor  to 
NASA  Langley  Research  Center  under  NASA  Contract  NAS1-15428. 

2.  SYSTEM  DESCRIPTION 

The  present  system  has  been  designed  to  accommodate  up  to  8  processors,  a  Software  Devel¬ 
opment  System,  fault  tolerant  redundant  power  supplies  and  8  1553A  terminals  that  can 
connect  the  SIFT  processors  to  sensors,  actuators,  controls  and  displays.  Each  processor 
is  capable  of  executing  a  complete  control  program,  typically  including  fly-by-wire  con¬ 
trol,  stability  and  control  augmentation,  autopilot  modes  including  autoland  functions, 
navigation,  guidance,  etc.  Each  processor  also  executes  the  redundancy  management  algo¬ 
rithms  including  fault  diagnosis  and  reconfiguration  strategies,  as  well  as  the  executive 
program.  Although  not  Included  in  the  SIFT  development  program,  a  preflight  and  mainte¬ 
nance  BIT  program  will  be  required  for  most  operational  avionic  applications  of  SIFT. 

Typically,  a  flight-control  application  program  includes  the  processing  of  sensor  data 
and  control  inputs  (filtering  and  otherwise  shaping  the  data  data,  voting  and  comparing 
of  redundant  data);  the  generation,  by  means  of  the  applicable  control  laws,  of  actuator 
and  instrumentation  commands  and  other  outputs;  the  engage,  disengage  and  mode  control 
logic;  fault  and  other  types  of  warning  displays;  and  ensuring  the  integrity  of  the  out¬ 
put  commands  by  appropriate  monitoring  and  switching  logic.  In  addition,  and  as  a 
characteristic  of  SIFT,  all  computed  data  that  is  transmitted  between  processors  is  sub¬ 
jected  to  the  software  voting  algorithms,  and  failures  in  any  processor  are  communicated 
to  all  processors. 

Figure  1  illustrates  the  arrangement  of  the  SIFT  architecture  and  shews  the  interproces¬ 
sor  serial  bus  structure  and  the  I/O  data  link  which  communicates  with  the  other  constit¬ 
uents  of  the  aircraft  flight  control  system.  The  I/O  data  link  is  a  MIL-STD-1553A  bus. 
Each  bus  can  communicate  with  32  remote  terminals. 

3 .  SYSTEM  OPERATION 

The  organization  of  each  computer-LRU  is  shown  in  Figure  2.  The  CPU  is  a  Bendix  930 
minicomputer.  Computations  and  broadcasts  of  data  are  carried  out  in  an  iterative 
sequence.  The  result  of  the  computations  are  temporarily  stored  in  the  scratch  pad 
memory  data  file  (IK)  that  is  uniquely  associated  with  the  processor.  Each  processor 
has  associated  with  it  its  own  program  memory.  This  memory  may  be  read  by  but  cannot  be 
written  into  by  any  other  processor.  The  data  file  can  be  accessed  by  the  broadcast 
transmitter,  the  receiver,  the  1S53A  data  link  and  the  CPU  in  this  order  of  priority. 

This  sequence  has  three  phases  which  control  the  .Activities  of  the  system  components. 

Load  Phase.  The  processor  computes  its  assigned  tasks,  loads  resultant  data  into  its 
local  "data  file",  loads  the  associated  destination  address  into  its  transaction  file, 
and  loads  the  starting  transaction  address  into  the  transaction  pointer.  The  broadcast 
sequencer  then  starts  the  Broadcast  Phase  of  operation  followed  by  the  Receiver  Phase  in 
each  destination  processor"!  It  should  be  noted  here  that  the  Broadcast~and  Receiver- 
phases  described  below  function  independently  of  the  processors  and  do  not  detract  from 
the  power  and  speed  of  the  CPU's  that  make  up  the  SIFT  Computer  System. 

Broadcast  Phase.  The  broadcast  sequencer  broadcasts  a  data  word  (from  "data  file")  along 
with  the  associated  destination  address  (fiom  "transaction  file")  at  a  maximum  rate  of 
1  data  word/15  microseconds.  This  broadcast  sequence  continues  until  End-of-File  (EOF) 
is  reached  in  the  "transaction  file."  Tho  flow  diagram  for  this  sequence  of  events  is 
shown  in  Figure  3.  End-of-File  (EOF)  is  reset  by  loading  the  transaction  pointer  with 
tho  starting  address.  The  16-bit  data  file  word  is  then  combined  with  the  7-bit  desti¬ 
nation  address  in  the  broadcast  transmitter  (Figure  4).  The  25-bit  serial  word  is  then 
concurrently  broadcast  to  all  other  processors  in  the  system.  The  EOF  is  updated,  and 
the  transaction  pointer  is  advanced  to  the  next  transaction  if  additional  data  words  are 
required  by  the  program.  Otherwise,  the  sequence  of  broadcasts  is  terminated. 

Receiver  Phase.  The  bit-serial  word  is  transmitted  in  synchronism  with  a  4  MHz  clock 
over  busses  that  are  dedicated  to  each  destination  processor.  The  transmitted  word  is 
stored  momentarily  in  dedicated  lecc.ivers  in  the  destination  processors.  Here,  receiver 
sequencers  (Figure  5)  scan  the  receivers  for  full  registers,  then  steer  the  data  words 
to  the  local  data  file  locations  indicatod  by  the  destination  addresses.  All  receiving 
processors  receive  the  same  data  words  and  store  these  data  words  at  the  same  relative 
locations  in  their  local  data  file.  The  maximum  time  to  load  a  received  word  into  the 
data  file  is  9.12  microsaconds ,  the  minimum  time  is  less  than  1  microsecond. 


32-3 


4 .  CPU  DESCRIPTION 

Central  Processor.  The  CPU  selected  for  the  SIFT  is  the  BDX-930,  the  latest  in  a  line 
of  Bendix  series  900  processors.  The  BDX-930  is  a  16-bit,  microprogrammed,  parallel, 
general-purpose  machine  employing  a  2901  bit-slice  ALU  (Arithmetic  Logic  Unit).  The 
architecture,  integrated  oy  Bendix  with  the  latest  standard  "off-the-shelf"  MSI  and  LSI 
components,  results  in  a  processor  specifically  tailored  for  high-speed  real-time  flight 
control  computations  qualified  for  military  applications. 

The  BDX-930  CPU  is  constructed  using  a  family  of  bipolar  micro-processor  devices  supported 
with  low  power  Schottky  MSI  arrays,  thereby  providing  maximum  computational  capability  in 
a  minimum  power  and  size  configuration. 

The  computer  per;oin.^  ^C-bit  parallel  arithmetic  operations  during  its  micro-command 
execution  time.  To  maximize  execution  speed,  an  instruction-stream  pipeline  organization 
:ls  used  which  provides  concurrent  fetch,  decode,  and  execute  operations,  together  with  a 
pipelined  microprogrammed  sequencer  and  broad  micro-control  field.  Therefore,  many  si¬ 
multaneous  functions  can  be  performed  at  maximum  speed. 

.'.n  addition,  there  are  separate  memory  address  and  data  buses  to  increase  the  throughput 
with  the  memory.  To  interface  with  the  slower  operating  speeds  of  core  memories  and 
various  I/O  devices,  a  request/response  system  is  used  to  lengthen  those  micro-orders  in 
which  communications  with  these  external  elements  is  necessary.  The  BDX-930  contains 
21  registers  which  are  usable  by  the  programmer,  Sixteen  of  these  serve  as  general  pur¬ 
pose  accumulators,  while  the  remaining  six  include  the  program  counter,  switch  register, 
and  four  specialized  single  bit  registers. 

Accumulators  0  through  15  are  used  as  general  purpose  accumulators,  providing  the  capa¬ 
bility  for  most  machine  operations.  The  registers  are  operated  upon  primarily  through 
use  of  a  powerful  set  of  inter-register  instructions.  Provision  is  also  made  to  utilize 
two  of  the  registers  as  index  registers  during  memory  reference  operations,  and  one  reg¬ 
ister  as  a  stack  pointer  in  stack  related  operations.  In  addition,  sequential  registers 
aro  automatically  linked  for  double  > recision  operations. 

Th;>  use  of  high-performance  Schottky  transistor-transistor  logic  elements  permits  extremely 
fact  internal  clocking  rates  -  as  high  as  16  megahertz  (62.5  nsec  period).  This  produces 
a  CPU  cycle  time  of  250  nanoseconds  and  an  average  operations  rate  of  942  KOPS.  Inte”- 
register  ADD  is  executed  in  250  nanoseconds;  firmware-based  MULTIPLY  is  executed  in 
5.1  microseconds. 

The  BDX  930  consists  of  86  microcircuits  mounted  on  one  printed  circuit  board  (approxi¬ 
mately  50  in."). 

5 .  MEMORY 

Memory  addresses  are  logically  subdivided  into  mapped  segments  as  shown  in  Figure  6. 

Each  processor's  main  memory  and  stack  contain  30K  words,  each  word  16  bits  long.  This 
memory  holds  the  SIFT  executive  program,  the  application  program,  and  the  control  stack. 

As  ncted,  the  cignificant  results  of  each  processor's  computations  are  temporarily  stored 
in  a  scratch  pad  memory  data  file.  Each  data  file  contains  IK  data  words,  each  word 
16  bits  long. 

High  speed  interprocessor  communication  is  provided  by  separate  processor/bus  interface 
elements  which  control  the  bit-serial  transmission  and  reception  of  data  words.  The 
memory  destination  of  each  transmission  is  provided  by  the  transaction  file  in  each  pro¬ 
cessor,  Each  transaction  file  contains  IK  words,  each  word  16  bits  long. 

Discrete  Functions.  A  reserved  block  of  8  addresses  is  used  to  address  12  discrete 
functions  that  are  firmware  or  hardware  implemented  (see  Figure  7).  These  functions 
increase  the  power  and  speed  of  the  SIFT  Computer  System.  The  implemented  functions 
include : 

•  read  processor  identity  number 

•  set  EOF 

•  read  real-time-clock 

•  write  (set)  real-time-clock 

•  read  1553A  registers 

•  write  1553A  registers 

External  I/O.  External  I/O  information  is  transferred  by  MIL-STD-1553A  serial  links. 

Time  division  multiplex  controllers  govern  the  data  flow  between  aircraft  actuators, 
sensors,  avionics  modules,  and  the  BDX  930  processor.  There  is  one  controller  for  each 
BDX  930  processor  and  1553A  bus.  Each  1553A  controller  and  bus  can  support  up  to  32 
remote  terminals  with  associated  actuators,  sensors,  or  avionics  modules. 

The  1553A  controller  is  a  32-bit-word-sized,  microcoded  processor.  It  has  address  com¬ 
putation  capability,  microcodec  test  routines,  ability  to  program  branch  and  special 
purpose  registers  and  ability  to  operate  on  a  prioritized  interrupt  or  polling  basis. 


t 


I 

I 


£ 

j 


32-4 


The  controller  shaves  memory  with  one  BDX  930  processor  through  the  data  file  of  the  pro¬ 
cessor.  The  interface  to  the  data  file  Is  parallel  by  16-bit  word.  The  interface  to  the 
1553A  bus  is  serial  by  bit.  The  1553A  controller  consists  of  two  major  sections: 

The  analog  section  is  a  waveform  and  impedance  converter.  It  converts  the  pulsed  digital 
data  received  from  the  digital  section  to  a  1  megabit  serial  155CA  bus-compatible  signal 
for  transmission  over  the  1553A  bus.  In  turn,  it  converts  the  received  1553A  bus  signals 
to  pulsed  digital  data  that  can  be  processed  by  the  digital  section.  The  digital  section 
responds  to  commands  f  com  the  BDX  930  processor  to  transmit,  receive,  or  idle.  In  addi¬ 
tion,  it  encodes  and  decodes  bus  data  as  required  by  mode  logic.  The  digital  section  is 
firmware-programmable  with  respect  to  parity  sense,  host  processor  byte/word  requirements, 
inter-word  gap  length  and  error  routines.  The  present  microcoded  routines  handle  these 
error  situations: 


•  early  RT  response 

•  late  or  no  RT  response 

•  incorrect  response  word 

•  inter-word  gap  too  long 

•  invalid  sync  or  parity 

•  Manchester  errors 


The  1E-53A  controller  consists  of  64  microcircuits  and  a  miniature  transformer  mounted  on 
one  printed  circuit  board. 


Computer  Specifications.  One  SIFT  computer/Processor  I.RU  is  composed  of  9  modules.  Com¬ 
puter  functions  are  allocated  to  these  modules  at'  follows: 


MbDULE 

- PUSCTI0N - 

BDX  930 

CPU 

Memory  #1 ,  #2 

Main  Memory,  15K  words  each 

Timing  &  Control 

Timing  for  CPU 

Real-Time  Clock 

I/O  Bus  Interface  Logic 

Control  Logic 

Memory  Interface  &  Control 

Transaction  File  and  Logic 

Data  File  and  Logic 

Transaction  Pointer 

Timing  for  Broadcast  (4  MHz) 

Broadcast  Shift  Register 

Processor  Interface 

Broadcast  Sequencer 

Receiver  Sequencer 

Broadcast / Receiver 

Broadcast  Drivers 

Receiver  Shift  Registers 

Holding  Registers 

1553A  Controller 

1553A  Controller  Functions  with  BDX  930  Interface 

Power  Supply 

+5V  D.C.  13  amps 

+15V  D.C.  1  amp 

Physical  Characteristics .  Each  SIFT  LRU  conforms  to  ARINC  404A  packaging  and  is  a  4  ATR 
short  unit.  It  weighs  approximately  13  lbs.  and  has  a  computed  MTBF  of  8,900  hours. 
Cooling  is  self-contained  by  moans  of  an  internally  mounted  fan.  Construction  is  per 
conventional  multi-layered  printed  circuit  boards.  25%  growth  space  is  provided  in  each 
unit.  Power  consumption  is  90  watts. 

Software  Development  System.  Software  generated  for  the  SIFT  system  can  be  developed  on 
allata  General  Eclipse  minicomputer  system.  The  equipment  is  capable  of: 

•  compiling,  assembling,  linking  and  testing  the  SIFT  code 

•  loading  the  digital  processors 

•  providing  real-time  monitoring,  timing  and  alteration  ‘f  executing  programs 

•  providing  automated  documentation  and  change  control 

•  providing  automated  module  testing 

•  providing  control  law  evaluation  capability  under  a  simulated  airplane/flight 
profile  environment 

The  equipment  comprising  the  Software  Development  System  consists  of: 

•  A  Data  General  Eclipse  minicomputer  chassis  containing  an  Eclipse  S/230  mini¬ 
computer  with  128K  bytes  of  core  memory,  a  floating-point  unit,  and  disk,  con¬ 
sole,  line  printer,  and  flight-processor  computer  interface  modules. 

•  a  10-megabyte  disk  unit 

•  a  console 

•  a  line  printer 

•  interface  modules 

•  an  EIA  interface  module 


32-5 


All  operations  are  performed  ur.der  software  control  executed  on  an  Eclipse  S/230  Data 
General  computer.  A  maximum  of  eight  Bendix  BDX  930  computers  can  be  controlled  with  the 
current  SDS  Interface. 

The  Interface  between  the  Eclipse  S/230  and  the  Bendix  BDX  930  consists  of  two  modules. 

The  first  module  is  a  standard  15"  x  15"  card  that  plugs  into  any  unused  I/O  slot  of  the 
S/230. 

The  second  module  is  mounted  in  a  rack  mounted  chassis  that  connects  with  the  first  module 
through  a  standard  EG  4192  paddleboard  connector  wired  to  the  unused  I/O  slot  of  the  S/230 
computer  backplane.  The  Bendix  BDX  930  computers  are  connected  to  the  rear  of  the  second 
module  through  their  respective  access  panel  cables.  The  Eclipse  S/230  governs  all  trans¬ 
fer  of  data  to  and  from  the  SDS  by  means  of  I/O  instructions.  Resident  in  the  Eclipse 
S/230  is  the  SDS  software;  software  to  control  a  Data  General  Nova  3/12  Computer  which 
controls  the  1553A  terminals;  and  the  software  to  simulate  sensors  and  actuators,  as  well 
as  airframe  characteristics  suitable  to  a  particular  application  to  demonstrate  the  capa¬ 
bility  to  actively  control  a  simulated  eirplane.  Figure  8  shows  the  overall  system  con¬ 
figuration  (including  the  test  set -up).  The  vertical  dotted  line  indicates  the  hardware 
resident  in  the  two  electronic  bay*.  shown  in  Figure  8.  One  controller  controls  the  1553A 
communication  link,  one  DMA  controller  is  needed  for  the  interprocessor  bus  transmitter, 
and  the  third  one  for  the  bus  receiver  as  shown  in  Figure  8. 

Testing .  In  order  to  validate  the  design  to  the  confidence  level  required  by  the  speci- 
f 1 cat ton ,  extensive  tests  were  planned.  These  include  hardware  tests,  system  tests,  and 
software  validation  tests.  Hardware  tests  ensure  the  correctness  of  the  design  implemen¬ 
tation,  specifically  timings,  interface  circuitry  operations,  and  integrity  of  construc¬ 
tion,  System  tests  are  conducted  both  on  developmental  models  of  SIFT  and  on  any  later 
production  versions.  These  tests  will  exercise  SIFT  in  both  open  and  closed  loop  modes. 
Loop  closure  is  achieved  by  tie-in  to  a  simulation  of  aircraft  dynamics  and  censors  on  a 
general  purpose  minicomputer  (e.g.,  Data  General  Eclipse,  PDP-11,  etc.). 

A  general  block  diagram  of  such  an  arrangement  is  shown  in  Figure  8.  The  purpose  of  sys¬ 
tem  tests  is  to  ensure  dynamic  stability,  achievement  of  static  and  dynamic  systom  accu¬ 
racy  requirements,  validation  of  execution  time  estimates  and  inter-sample  ripple  char¬ 
acteristics,  and  observing  system  behavior  in  the  presence  of  injected  faults.  These 
and  similar  characteristics  of  the  system  can  only  bo  evaluated  in  a  dynamic  environment 
that  is  as  similar  to  the  aircraft  environment  as  possible.  In  view  of  their  cost  and 
the  difficulties  of  controlling  flight  test  conditions,  flight  tests  are  only  used  for 
final  system  validation  and  certification,  and  very  rarely  for  development  purposes. 
Because  SIFT  depends  on  correct  software,  not  only  for  the  application  program,  but  spe¬ 
cifically,  for  voting,  fault  detection  and  isolation,  and  reconfiguration,  it  is  essen¬ 
tial  that  the  software  be  validated  to  an  extremely  high  level  of  confidence.  Software 
quality  will  be  enc  \anced  by  the  use  of  a  structured  language  foT‘  the  higher  level  soft¬ 
ware.  Extensive  software  tests  are  planned  on  the  prototype  SHT  system  and  these  tests 
have  been  proceeding  at  SRI,  International.  These  tests  will  encompass  complete  flight 
conditions,  environmental  conditions,  single  and  multiple  failures,  crew  inputs,  etc. 

Later  on,  much  of  the  testing  will  also  be  performed  at  NASA  Langley  Research  Center. 

The  SIFT  hardware  has  undergone  extensive  testing,  and  an  effective  'on  line'  time  of 
3  months  was  achieved  with  less  than  1%  infant  mortality  rate  experienced  during  the 
initial  first  week  of  testing  per  box.  Figure  9  shows  the  hardware  configuration. 

Prior  to  delivery  to  SRI,  the  following  tests  were  conducted. 

A.  The  extended  Bendix  BDX  930  CPU  Test  was  conducted  on  all  7  SIFT  computers  and  run 
continuously  for  30  minutes  each  with  no  failures.  This  tests  all  ths  instructions 
in  the  BDX  930  repertoire.  A  memory  test  was  conducted  on  all  the  SIFT  computers  in 
which  bit  patterns  were  written  and  then  verified  in  all  memory  locations,  main 
memory,  transaction  file,  data  file,  and  discrete  registers.  This  test  was  conducted 
continuously  for  20  minutes  each  with  no  failures. 

B.  Interprocessor  data  transfer  tests  were  conducted  where  each  processor's  ID  was 
stored  in  the  upper  data  file  location  starting  at  location  7780,,  to  77F7,,.  All 
SIFT  processors  then  broadcast  simultaneously  their  respective  ID  values  to  all  other 
processors.  Each  data  file  was  printed  out  and  verified  as  to  correct  data  content. 
Each  SIFT  Processor  was  moved  to  a  new  ID  location  and  the  test  was  repeated.  An 
additional  test  was  also  conducted  where  the  Real-Time-Clock  value  of  each  processor 
was  transmitted  once  to  each  other  processor  and  then  verified  by  reading  the  data 
file.  This  test  was  conducted  for  each  data  file  location.  These  tests  we"e  con¬ 
ducted  at  full  clock  rates  and  no  failures  were  noted. 

C  All  SDS  (Software  Development  System)  functions  wore  performed  simultaneously  on  the 
7  SIF""  processors  including  halt,  read,  load,  ard  restart  with  no  failures  noted. 

D.  Data  transfer  tests  over  the  1553A  Data  Link  were  conducted  for  blocks  of  words  from 
1  50  32  in  Controller  to  Terminal  and  Terminal  to  Controller  mode  for  all  SIFT  pro¬ 
cessors  simultaneously.  Terminal  to  Terminal  transmission  mode  tests  were  not  con¬ 
ducted  since  the  system  is  not  configured  for  this  mode  of  operation.  This  capability 
is  nrovided  for  in  each  processor. 

All  the  tests  were  conducted  at  the  nominal  input  voltage  of  28  volts  DC,  and  again  at 
32  volts  DC  and  24  volts  DC,  with  no  failures  noted. 


32-6 


Concluding  Remarks.  A  brief  description  of  the  SIFT  system,  with  emphasis  on  implementa¬ 
tion,  has  been  presented.  The  system,  a  multi-processor  computing  system  that  relies  on 
software  -  implemented  fault  detection  and  reconfiguration  algorithms,  is  an  efficient 
approach  to  the  design  of  ultra-reliable  avionics.  Its  development  will  pave  the  way 
for  the  acceptance  of  fly-by-wire  and  other  advanced  flight  control  systems. 

Potential  applications  of  SIFT  include  all  new  technology  aircraft,  including  the  energy- 
efficient  transport,  military  developments  such  as  a  new  strategic  bomber,  and  space¬ 
craft  such  as  an  Advanced  Shuttle.  SIFT  should  be  considered  whenever  flight  control 
system  survivability  requirements  can  not  be  satisfied  with  conventional  triplex  or  dual¬ 
dual  configurations.  It  shoulu  also  be  considered  in  the  context  of  integrated  avionic 
systems  having  a  mixture  of  flight-critical  and  non-flight  critical  sub-systems. 


FIGURE  4 


receiver 

SEQUENCER 


FIGURE  5 


tnm 

trtm 

mm 


MMffHCHMRM 


MM  IMMIIATMMI 


UUMMIM  CIK« 


HUIHUCM.II1 


mm 


mm 


MMIHMCM.MI. 


MM  HIM  IMS. 


MM  IHMTMI. 


miM 


UM  TIAMMT1M  MMIt* 


MTIMATMItm 


ttWTIIMMlUT 


Ml!  HIM  CM  Ml. 


■UtllUM  T  Ml. 


WHTI  IHMUHIHIII 


MEMORY  MAP 
FIGURE  e 


DISCRETES 
FIGURE  7 


33-1 


STATE-OF-THE-ART 
COMPUTER  MONITORING  EQUIPMENT 


Harvey  G.  Nelson 
Naval  Weapons  Center 
Facility  Engineering  Branch  (Code  3115) 
China  Lake,  CA  93555,  U.S.A. 


SUMMARY 

In  any  tactical  airborne  computing  system,  it  is  crucial  for  developers  and  maintenance  personnel  to  know  in  considerable 
detail  what  is  happening  inside  the  computer  on  a  real-time  basis.  This  is  especially  true  for  a  distributed  system.  This 
paper  describes  a  hardware  monitor,  called  SOVAC  (Software  Validation  ^nd  Control),  that  provides  a  high-capacity,  real¬ 
time,  user-selective  “window”  that  gives  visibility  into  the  internal  workings  of  the  tactical  computer. 

1.  INTRODUCTION 

SOVAC  is  a  computer  monitor  and  controller  that  can  be  thought  of  in  terms  of  its  basic  components  and  the  environment 
it  is  used  in.  Figure  1.1  illustrates  this  concept. 


FIGURE  1.1.  The  SOVAC  system  and  its  environment. 


There  are  three  major  components  of  the  system:  (1)  The  tactical  computer  and  its  environment.  The  minimum  requirement 
is  to  have  an  operating  tactical  computer.  The  computer  may  be  installed  in  an  operational  aircraft,  or  installed  in  an 
aircraft  simulator  or  on  a  test  bench.  (2)  The  special-purpose  hardware,  which  will  be  described  in  section  4  is  attached  to 
the  tactical  computer  AGE  (Automatic  Ground  Equipment)  port.  This  allows  it  to  monitor  and,  if  desired,  control  the 
tactical  computer’s  internal  operation.  (3)  Connected  to  the  special-purpose  hardware  is  a  general-purpose  minicomputer  with 
a  cathode-ray-tube-type  terminal.  Our  systems  are  using  a  Digital  Equipment  Corporation  PDP-11/34  with  a  VT100 
terminal.  If  hardcopy  and/or  graphics  is  desired,  a  suitable  terminal  can  be  added.  This  host  minicomputer  with  its  special- 
puipose  SOVAC  software  provides  the  user  interface  to  the  SOVAC  system  and  through  its  use  interface  to  the  tactical 
computer. 

The  basic  functions  of  SOVAC  are:  (1)  Automated  computer  control.  This  includes  the  ability  to  start,  stop,  and  reset  the 
computer.  The  user  may  also  load  (or  read)  memory  or  a  part  of  memory.  (2)  Computer  imaging.  A  copy  of  the  contents 
of  all  internal  registers  is  maintained  at  all  times.  (3)  Data  compare.  SOVAC  provides  the  user  with  the  ability  to  take 
action  based  on  the  value  of  a  specific  piece  of  data.  (4)  Counting.  SOVAC  can  count  the  number  of  times  a  specific  event 
takes  place.  (5)  Timing.  The  time  between  any  two  events  can  be  accurately  measured.  (6)  Breakpoint/event  detection. 
SOVAC  can  detect  a  user-defined  condition  such  as  the  access  of  a  specific  address  location,  counter  value,  or  data  value 
(or  combination  thereof)  and  then  take  a  user-specified  action.  (7)  Data  selection  and  logging.  The  user  may  selectively 
cause  a  large  number  of  data  words  and/or  registers  to  be  logged  each  computer  cycle.  (8)  Tracing.  Four  types  of  traces 
with  three  triggering  modes  are  available  under  user  control. 

SOVAC  is  a  powerful  tool  for  anyone  who  has  a  need  to  know  what  is  happening  inside  a  tactical  computer.  It  is  designed 
to  be  a  common  tool,  with  an  absolute  minimum  of  user-related  differences  between  its  use  on  various  tactical  computers. 
A  key  feature  of  SOVAC  is  that  it  is  entirely  passive  in  its  effect  on  the  target  computer  unless  the  operator  specifically 
dictates  otherwise.  Thus,  SOVAC  provides  a  flexible,  real-time,  user-interactive  tool  that  can  greatly  increase  the  productivity 
of  those  who  work  with  tactical  computers. 

In  this  paper,  I  will  review  the  basic  steps  of  software  engineering  as  applied  to  the  life  cycle  of  tactical  software.  After 
establishing  a  common  reference  pr  int,  I  will  discuss  the  uses  of  a  computer  monitor  and  display  device  like  SOVAC  in 
each  of  the  phases  of  the  software  cycle.  Following  that,  I  will  explain  in  greater  detail  the  SOVAC  functions  and  then 
briefly  describe  its  architecture. 

2.  SOFTWARE  ENGINEERING  REVIEW 

2.1.  Software  System  Life  Cycle  Overview 

In  this  section,  I  wiil  discuss  very  briefly  the  life  cycle  process  associated  with  tactical  airborne  computing  systems.  Most  of 
my  comments  can  be  applied  to  the  whole  system,  that  is,  to  the  hardware  and  software.  However,  I  will  be  directing  my 
comments  primarily  toward  the  software  issues.  Figure  2.1  illustrates  the  software  life  cycle  process  (Jensen  and  Tonies, 
1979).  By  life  cycle  process,  I  am  referring  to  the  life  of  the  system  from  its  conception  through  development  and  its 
operational  use,  including  maintenance  and  updates. 


33-2 


FIGURE  2.1.  Software  system  life  cycle. 


I  will  briefly  discuss  each  phase  of  the  software  life  cycle  in  the  following  paragraphs: 

2.1.1.  Survey.  Before  one  gets  into  a  serious  analysis,  it  is  important  to  determine  if  there  is  justification  for  proceeding 
with  a  project  and  to  document  the  related  parameters.  The  survey  phase  is  much  like  the  analysis  phase,  in  that  it  tries  to 
define  what  will  be  built  and  to  produce  a  tentative  budget  and  schedule  information.  However,  in  the  survey  phase  the 
tasks  accomplished  are  done  with  the  minimum  of  rigor  needed.  The  output  of  this  process  is  the  feasibility  document.  In 
simple  projects,  the  survey  may  be  very  brief. 

2.1.2.  Analysis.  This  process  is  quite  involved  and  extremely  important.  It  is  in  the  analysis  phase  that  the  in-depth 
analysis  of  what  is  desired  is  documented.  The  analyst  is  responsible  for  understanding  the  needs  and  desires  of  the  user 
and  for  applying  his  analysis  tools  to  derive  the  specification.  The  specification  should  completely,  but  in  an  easy-to- 
understand  form,  define  what  the  desired  system  should  be  able  to  do. 

2.1.3.  Design.  The  design  process  determines  hjiw  the  desired  system  will  be  implemented.  It  concerns  itself  with  how  the 
various  functions  are  to  be  allocated  among  various  modules  and  what  the  resultant  modular  interfaces  will  be.  The  modules 
are  then  designed  by  incorporating  detailed  specification  information  into  a  set  of  module  descriptions.  Finally,  in  the 
packaging  process,  the  environment-independent  design  is  modified  to  take  into  account  the  realities  of  the  machines, 
operating  systems,  coding  languages,  data  base  processors,  and  so  forth.  In  addition  to  the  packaged  design,  the  teat  plan  is 
also  generated  at  this  stage. 

2.1.4.  Implementation.  After  the  design  phase,  we  arrive  at  the  coding  phase.  I  am  also  including  in  this  phase  the 
testing  associated  with  each  module  before  it  is  integrated  with  the  other  modules  of  the  system  and  the  integration  of  the 
individual  modules  into  a  complete  package.  This  phase  may  also  include  any  hardware/software  integration  required.  1  am 
including  integration  in  the  implementation  phase  because  it  is  often  done  concurrently  with  the  coding.  The  successful 
conclusion  of  this  phase  is  an  integrated  program  that  is  ready  for  final  testing. 

2.1. 5.  Testing.  In  the  testing  phase,  the  total  system  is  tested  to  ensure  that  the  requirements  as  derived  in  the  analysis 

phase  have  been  met  and  that  the  system  is  ready  for  operational  use.  This  is  a  very  crucial  step  for  tactical  systems  for 

obvious  reasons.  It  is  also  an  extremely  difficult  task  in  that  it  is  impossible  to  completely  test  even  a  relatively  simple 
computer  program.  The  result  of  successfully  completing  this  phase  is  that  the  system  is  ready  for  operational  use. 

2.1.6.  Iteration.  It  would  be  nice  if  we  could  completely  finish  each  phase  correctly  before  proceeding  to  the  next. 

However,  it  seldom,  if  ever,  works  out  that  way.  For  example,  the  designer,  in  determining  how  to  implement  a  specific 
requirement,  may  find  that  the  requirement  is  unreasonable  or  perhaps  impossible.  Thus,  formally  or  informally,  the  require¬ 
ment  is  modified.  This  iteration  is  shown  in  Figure  2.1  as  the  arrow  coming  out  of  the  bottom  of  the  box  and  turning 

to  the  left.  It  is  extremely  important  to  provide  for  and  control  the  iteration  process  rather  than  to  ignore  it. 

2.1.7.  Entropy.  Entropy  is  the  energy  that  is  dissipated  during  each  process  and  thus  does  not  show  up  in  the  final 
output  of  the  phase.  Entropy  is  symbolized  in  Figure  2.1  by  the  squiggly  arrow.  Entropy  is  a  waste  of  resources.  In  our 
environment,  entropy  refers  to  a  waste  of  programmer  manpower,  slipped  schedules,  cost  overruns,  and  perhaps  faihire  of 
the  project.  It  is  this  issue  relative  to  the  processes  that  brings  us  to  the  subject  of  tools  for  software  life  cycle  use. 

2.1.8.  General  Comments.  My  purpose  in  presenting  the  above  breakdown  of  the  process  is  to  provide  us  with  a  common 
view  of  the  process.  This  will  facilitate  the  discussion  that  follows. 


33-3 


2.2.  Software  Tools 

Bell  Laboratories  has  established  the  concept  of  a  Programmers  Workbench.  As  Bell  Laboratories  uses  the  term,  it  is  a 
collection  of  programs  integrated  with  an  enlightened  operating  system  (Kemighan  and  Plauger,  1976).  The  integrated  set  of 
programs  that  can  be  used  to  help  the  programmer  perform  his  work  are  called  software  tools.  SOVAC  is  a  tool  that  is 
composed  of  more  than  just  programs. 

2.3.  SOVAC 

2.3.1.  As  noted  in  section  1,  SOVAC  is  a  tool  that  is  composed  of  a  general-purpose  minicomputer  ant  special-purpose 
hardware  and  software.  It  is  a  tool  that  provides  the  user  with  very  powerful  computer  control  and  monitoring  capabilities. 
As  a  tool,  SOVAC  is  most  important  in  the  implementation,  test,  and  operational  phase,  as  discussed  in  section  2.1  above. 
As  will  be  shown  later,  it  also  is  applicable  to  the  survey,  analysis,  and  design  phases. 

2.3.2.  In  this  section  I  will  discuss  the  uses  of  SOVAC  in  the  context  of  the  above  discussion  of  the  software  life 
cycle.  For  each  phase,  I  will  suggest  ways  that  we  can  decrease  the  entropy  of  the  process  by  using  a  tool  such  as 
SOVAC. 

2.3 .2.1.  Analysis  Phase.  In  this  phase  it  is  especially  important  to  know  what  is  being  asked  for.  As  mentioned  above, 
SOVAC  has  the  ability  to  measure  or  quantify  most  information  that  one  desires  to  know  about  the  internal  workings  of  a 
program  in  a  tactical  computer.  Thus,  if  there  is  a  prototype  or  earlier  version  of  the  system,  SOVAC  can  be  used  to 
provide  benchmark  information  such  that  there  can  be  higher  confidence  in  the  resultant  requirements  document. 

2.3. 2. 2.  Design  Phase.  The  ability  to  quantify  the  operation  of  a  previous  version,  or  a  prototype,  of  the  software  can 
provide  valuable  information  for  »he  designer.  In  addition,  the  designer  could  use  SOVAC  to  modify  or  insert  a  special 
algorithm  into  the  target  computer  for  evaluation. 

2.3.2.3.  Implementation  Phase.  SOVAC  can  be  used  in  many  ways  during  the  implementation  phase.  It  is  in  this  phase 

that  concern  is  primarily  with  the  validity  of  a  particular  module  or  perhaps  a  small  group  of  modules.  SOVAC  could  be 
used  to  load  these  into  the  tactical  computer  with  or  without  supporting  modules.  One  could  then  single-step  through  the 
module,  collect  data,  and  ensure  that  on  a  stand-alone  basis  the  module  works  as  expected. 

2.3.2.4.  Test  Phase.  In  this  phase,  the  ability  to  monitor  the  operation  of  the  computer  without  affecting  its  operation 

is  also  crucial.  It  has  been  shown  that  the  ability  to  exhaustively  test  all  combinations  of  paths  through  even  relatively 
small  programs  is  impossible.  SOVAC  may  be  used  to  allow  much  greater  internal  data  about  the  operation  of  the  program 
to  be  collected  and  thus  gain  a  confidence  about  the  program  as  a  whole  that  may  be  very  difficult  to  obtain  otherwise. 

2.3.2.5.  Operational  Phase.  In  this  phase,  the  ability  of  SOVAC  to  monitor  and  collect  data  about  the  internal  operation 
of  the  tactical  computer  without  affecting  its  operation  is  crucial.  SOVAC  can  be  used  to  collect  the  data  necessary  to 
assess  the  validity  of  the  operational  program.  As  the  program  moves  toward  the  need  for  being  updated  in  either  a  major 
or  minor  manner,  SOVAC  can  be  a  very  useful  tool  to  collect  the  data  needed  for  the  decision-making  process.  This  ability 
is  discussed  further  in  the  comments  on  the  analysis  and  design  phases. 

2.4.  SOVAC  as  a  Hardware  and  System  Tool 

SOVAC  is  also  an  indispensable  tool  for  avionics  systems  personnel  and  the  hardware  engineers  and  technicians.  With  the 
SOVAC,  they  can  monitor  and  record  the  input  and  output  activity.  For  example,  it  could  be  used  to  determine  that  a 
certain  bit  is  always  0  or  1.  For  output  channel  testing,  the  SOVAC  could  be  used  to  force  a  particular  bit  pattern  on  a 

specific  output  channel.  Systems  personnel  could  use  it  in  much  the  same  way  to  find  out  what  type  of  data  is  being 

passed  around  the  system.  This  could  be  especially  useful  with  distributed  systems. 

3.  SOVAC  FUNCTIONS 

In  this  section  I  will  explain  in  moderate  detail  each  of  the  basic  SOVAC  functions.  The  basic  functions  of  SOVAC  are 
summarized  in  Figure  3.1. 

3.1.  Automated  Computer  Control 

Automated  controi  includes  the  ability  to  start,  stop,  and  reset  the  computer.  Also,  it  is  possible  to  load  (or  read)  memory 
or  a  part  of  memory.  Examples: 

Load  from  a  file  all  (or  part)  of  memory. 

Verify  all  (or  part)  of  the  computer  memory  against  any  specified  file. 

Copy  from  computer  memory  to  disk,  display,  or  hardcopy  all  or  part  of  memory  in  hex,  ASCII,  binary,  or  real 
format. 

Reset  and  start  the  computer  at  the  program  entry  point. 

Halt  the  computer  based  on  any  user  specified  condition  or  breakpoint. 

Interactively  observe  and/or  change  any  memory  or  register  value. 


USER 

INTERFACE 

AUTOMATED 

COMPUTER 

CONTROL 

COMPUTER 

IMAGING 

_ i 

DATA 

COMPARE 

COUNTING 

TIMING 

1  BREAKPOINTING/ 

|  EVENT  DETECTION 

_ 

COMPLEX 

CONDITION 

BREAXPOINTING 

DATA 

SELECTION 

AND 

LOGGING 

TRA 

CING 

FIGURE  3.1  The  ten  buic  functions  of  SOVAC. 


3.2.  Computer  Imaging 

A  copy  of  the  contents  of  all  internal  registers  is  maintained  at  al’  times.  The  contents  of  these  image  registers  may  be 
used  for  real-time  logging  and  display  purposes. 

3.3.  Data  Compare 

SOVAC  provides  the  user  with  the  ability  to  take  action  based  on  the  value  of  a  specific  piece  of  data.  The  compare  may 
be  done  against  any  memory  location  or  specified  register. 

3.4.  Counting 

At  the  occurrence  of  any  specified  event,  one  of  several  counters  may  be  reset,  incremented,  decremented,  or  read.  This 
allows  SOVAC  to  collect  data  on  the  number  of  times  a  specific  event  takes  place. 

3.5.  Timing 

At  the  occurrence  of  any  specified  event,  one  of  several  timers  may  be  reset,  started,  stopped,  or  read.  Thus,  the  time 
between  any  two  events  can  be  accurately  measured. 

3.6.  Breakpoint/Event  Detection 

SOVAC  can  detect  a  user-defined  condition  such  as  the  access  of  a  specific  address  location,  counter  value,  or  data  value 
and  then  take  a  user-specified  action. 

3.7.  Complex  Condition  Breakpointing 

The  user  may  select  a  complex  combination  of  events  described  in  the  paragraphs  above  and  use  these  to  initiate  the 
breakpoint. 

3.8.  Data  Selection  and  Logging 

The  user  may  selectively  cause  a  large  number  of  da  la  words  and/or  registers  to  be  logged  each  computer  cycle. 

3.9.  Tracing 

Four  types  of  traces  with  three  triggering  modes  are  available  under  user  control.  For  each  type  of  trace,  the  user-specified 
data  is  stored  in  a  first-in-first-out  history  (FIFO)  stack.  This  is  done  on  a  real-time  basis  without  altering  the  operation  of 
the  tactical  computer. 

The  types  of  traces  available  include: 

Full  instruction  trace  where  each  instruction  cycle  initiates  the  storage  into  the  FIFO  stack. 

Partial  instruction  trace  where  each  instruction  within  a  user-specified  range  is  recorded.  This  allows  concentration  of 
the  data  gathering  resources  to  a  specific  area  of  interest. 


33-5 


Event  trace  allow!  data  to  be  itored  at  each  occurrence  of  a  specific  event. 

Branch  trace  allows  the  storage  to  take  place  when  the  instruction  counter  changes  by  greater  than  a  user-specified 
value. 

The  three  modes  of  the  trace  function  are: 

Normal.  In  this  mode  the  trace  action  takes  place  when  the  breakpoint  occurs. 

Delay.  In  this  mode  the  b-.ce  action  is  delayed  from  the  specified  event  by  a  user-selectable  time,  count,  or  number 
of  events. 

Prerecord.  In  this  mode  the  trace  action  takes  place  at  each  specified  event  and  is  stopped  by  a  user-specified  event. 
This  allows  the  operations  up  to  a  specific  event  to  be  stored.  Adding  a  user-specified  delay  to  this  mode  allows  the 
operations  before  and  after  the  specified  event  to  be  recorded. 

4.  SOVAC  ARCHITECTURE 

4. 1 .  Overview 

The  basic  functional  components  of  SOVAC  are  shown  in  Figure  4.1  below. 

The  following  sections  will  explain  the  role  of  each  part  of  the  SOVAC  hardware  in  further  detail. 

4.2.  Tactical  Computer  Interface 

The  tactical  computer  interface  is  composed  of  the  computer  control  section  and  the  computer  image  section.  The  computer 
control  section  provides  real-time  control  of  the  tactical  computer  and  provides  the  capability  to  capture  information 
available  on  the  tactical  computer’s  bus  and  control  lines.  This  is  the  most  difficult  of  the  sections  to  design.  In  general  we 
have  found  it  very  hard  to  obtain  the  level  of  documentation  necessary  to  make  the  design  straightforward.  The  techniques 
used  have  combined  detailed  analysis  of  the  computer  documentation  available  and  the  use  of  logic  analyzers  to  gather 
empirical  data  about  the  activity  on  the  AGE  port.  It  also  happens  that  not  all  desired  signals  are  available  at  the  AGE 
port.  This  makes  the  SOVAC  design  much  more  dii.lcult  in  that  the  internal  activity  of  the  computer  must  be  inferred 
from  the  activity  of  the  signals  that  are  available. 

The  computer  image  section  contains  a  copy  of  each  of  the  tactical  computer’s  registers.  For  those  situations  in  which  a 
given  register  may  have  more  than  one  function  during  the  execution  of  a  specific  instruction,  a  copy  of  the  register  for 
each  of  its  functions  is  provided  Thus,  at  any  time,  the  contents  of  all  the  internal  registers  are  available  for  whatever 
use  is  desired.  These  uses  will  be  discussed  in  the  following  sections. 

4.3.  SOVAC  Controller 

The  high-speed,  microprogrammed  SOVAC  controller  coordinates  the  operation  of  the  various  subsystems.  It  monitors  the 
contents  of  the  tactical  computer  image  registers  and  has  the  capability  to  recognize  various  types  of  events  or  complex 

combinations  of  events  and  set  a  breakpoint  or  store  data  into  a  hardware  FIFO  stack  that  is  18  words  wide.  The  data 

selection  and  logging  capability  is  very  flexible.  The  breakpoints  can  be  used  for  a  wide  variety  of  purposes  including  the 
control  of  counters  and  timers. 

The  functions  of  the  controller  are  set  up  under  the  control  of  the  user.  The  controller  must  do  all  of  its  evolutions  within 

the  instruction  cycle  time  of  the  tactical  computer.  Thus,  the  activity  in  the  controller  is  several  orders  of  magnitude  faster 

than  could  be  controlled  from  the  host  minicomputer.  Therefore,  the  controller  functions  are  set  up  by  the  host 
minicomputer  under  user  control. 

4.4.  Minicomputer  Interface 

This  interface  is  placed  on  the  minicomputer  system  bus  and  provides  the  data  path  between  the  minicomputer  and  the 
SOVAC  controller. 


TO 

MINI¬ 

COMPUTER 


TO 

TACTICAL 

COMPUTER 


TO 

TACTICAL 

COMPUTER 


FIGURE  4.1.  SOVAC  hardware  system. 


4.5. 


Commonality/Portability 


This  is  an  appropriate  time  to  note  that  the  SOVAC  concept  is  not  applicable  to  Just  one  specific  tactical  computer.  In 
fact,  considerable  effort  has  been  expended  to  minimize  the  uniqueness  of  each  SOVAC.  This  it  especially  true  of  the  user 
interface.  If  a  user  knows  how  to  use  a  SOVAC  for  one  tactical  computer,  he  should  know  how  to  use  It  for  any  others 
without  further  training.  For  maintenance  purposes  it  is  also  desirable  to  minimize  the  differences  between  the  hardware 
designs  of  the  various  SOVAC  models.  Referring  to  Figure  4.1,  the  minicomputer  interfaces  (and  the  minicomputers  and 
peripherals)  art  common  among  all  SOVACs.  Also,  the  bus  systems  are  common  between  the  SOVACs.  The  computer 
control  and  computer  imaging  hardware  is  unique  to  each  tactical  computer  type.  The  event  detection,  selection  and  logging, 
and  system  controller  are  similar  in  concept  but  unique  in  implementation  due  to  the  differences  in  the  number  and  type 
of  registers  in  the  various  computers. 

5.  CONCLUSION 

SOVAC  is  a  powerful  tool  for  anyone  who  has  a  need  to  know  what  is  happening  inside  a  tactical  computer.  It  Is  designed 
to  be  a  common  tool  with  an  absolute  minimum  of  user-related  differences  between  its. use  on  various  tactical  computers. 
A  key  feature  of  SOVAC  is  that  it  is  entirely  passive  in  its  effect  on  the  target  computer  unless  the  operator  specifically 
dictates  otherwise.  Thus,  SOVAC  provides  a  flexible,  real-time,  user-interactive  tool  that  can  greatly  increase  the  productivity 
of  those  who  work  with  tactical  computers. 


REFF.RENCES/BIBLIOGRAPHY 

DeMARCO,  Tom,  1978,  “Structured  Analysis  and  System  Specification”,  Youdon  Press. 

DeMARCO,  Tom,  1979,  “Concise  Notes  on  Software  Engineering”,  Yourdon  Press. 

FLETCHER,  W.  I.,  1980,  "An  Engineering  Approach  to  Digital  Design",  Prentice-Hall, 

JENSEN,  R.  W.,  and  TONIES,  C.  C.,  1979,  “Software  Engineering”,  Prentice-Hall. 

KERNIGHAN,  B.  W.,  and  PLAUGER,  P.  J.,  1976,  “Software  Tools”,  Addison-Wcsley. 

FRYER,  R.  E.,  1980,  “The  User  Interface  for  a  Real-Time  Software  Debugging  System”,  Proc.  of  the  Fourteenth  Asilomar 
Conference  on  Circuits,  Systems,  and  Computers. 

LEMON,  L.  M.,  1979,  “Hardware  System  for  Developing  and  Validating  Software”,  Proc.  of  the  Thirteenth  Asilomar 
Conference  on  Circuits,  Systems,  and  Computers. 

YOURDON,  E.,  1979,  “Classics  in  Software  Engineering”,  Yourdon  Press. 

YOURDON.  E„  and  CONSTANTINE,  L.  L.,  1979,  “Structured  Design:  Fundamentals  of  a  Discipline  of  Computer  Program 
and  Systems  Design”,  Prentice-Hall. 


34-1 


INTEGRATED  CONTROL  OF  MECHANICAL  SYSTEMS  FOR 
FUTURE  COMBAT  AIRCRAFT 


G.  K.  WILCOCK 

Royal  Aircraft  Establishment,  Farnborough,  U.K. 

P.  A.  LANCASTER  *  C.  MOXEY 
British  Aerospace,  Warton  Division,  U.K. 

SUMMARY 

This  paper  describes  a  system  for  achieving  digital  control  and  monitoring  of  Utility 
Systems  for  future  combat  aircraft.  The  aim  is  to: 

i)  Reduce  penalties  such  as  mass  and  engine  power  take-off  associated  with  conventional 
systems . 

ii)  Reduce  pilot  workload, 

iii)  Improve  maintainability, 

iv)  Increase  survivability, 

v)  Reduce  the  cost  of  ownership. 

The  paper  explores  various  approaches  to  system  design,  leading  to  a  system  utilising 
distributed  processors  and  data  terminals  linked  via  interfaces  to  the  Utility  Systems' 
components . 

The  work  to  date  has  shown  that  a  significant  number  of  the  objectives  can  be  achieved; 
for  example,  a  weight  saving  of  approximately  100  Kg  (i.e.  50%) ,  and  a  pilot  workload 
reduction  of  the  order  of  4:1,  may  be  achieved  in  a  twin  engine  combat  aircraft. 


1.  INTRODUCTION 

In  today's  generation  of  combat  aircraft,  mechanical  systems  or  "Utilit”  Systems"  -  such 
as  those  associated  with  Powerplant  Control,  Secondary  Power,  Environmental  Control, 
Hydraulics  and  Fuel  Gauging/Management  -  have  been  designed  as  individual  systems  and 
consequently  have  their  own  dedicated  control  units.  The  result  is: 

1.  A  large  number  of  dedicatad,  single  function  Line  Replaceable  Units  (LRUs). 

2.  Boxes  containing  relay  and  diode  logic. 

3.  Large  numbers  of  discrete  wires. 

4.  Many  dedicated  cockpit  Instruments. 

5.  Dedicated  switches  and  warning  lamps. 

The  many  interconnections  result  in  large  cable  looms;  these  impose  a  severe  installation 
penalty  on  the  aircraft.  The  cable  looms  may  also  be  subject  to  damage  and  Electro- 
Magnetic  Interference. 

For  future  aircraft  this  method  of  control  is  likely  to  be  unsatisfactory  due  to  the 
limited  space  available  and  is  inefficient  in  terms  of  equipment  utilisation.  In 
addition,  future  high  technology  combat  aircraft  will  incorporate  a  highly  integrated 
avionic  system  and  will  require  increased  automation  from  all  systems,  including  Utility 
Systems,  in  order  to  significantly  reduce  the  pilot  workload  (especially  in  a  single 
cockpit  configuration) . 

The  above  demands,  together  with  the  increasing  use  of  serial  digital  data  transmission 
systems  means  that  alternative  design  methods  must  be  applied  to  Utility  Systems. 

A  considerable  amount  of  research  work  has  been  progressing  at  British  Aerospace,  Warton 
(under  both  MOD  and  Private  Venture  funding)  and  also  at  the  Royal  Aircraft  Establishment, 
Farnborough,  into  alternative  methods  of  controlling  the  Utility  Systems.  The  most 
favoured  approach  for  realising  this  control  is  to  consider  a  Central  Management  System 
which  controls  ALL  of  the  Utility  Systems,  as  listed  in  Para,  3. 

The  result  of  this  approach  is  the  Integrated  Control  of  Mechanical  Systems  (INCOMS) 
which  is  based  on  a  number  of  data  acquisition  and  control  units  (INCOMS  Processors)  which 
ere  geographically  dispersed  throughout  the  airframe.  These  INCOMS  Processors  will 
operate  independently  os  individual  computing  centres,  and  will  be  interconnected  via  a 
Mil, -STD  1553B  data  bus  (or  its  derivative) .  Some  of  these  Processors  will  act  as  remote 
terminals,  collecting  raw  data  for  onward  transmission  (via  the  data  bus)  to  their 
designated  processing  centre (s). 


34-2 


It  must  be  borne  in  mind  that,  whilst  most  utility  Components  are  inherently  simplex  in 
nature,  (in  terms  of  their  input  and  output  functions),  the  systems  in  which  they  are 
incorporated  are  mainly  safety  critical  systems.  Separate  redundant  components  and 
control  circuits  are  included,  where  appropriate,  to  achieve  the  necessary  reliability. 


2 .  BACKGROUND  SUPPORT  ARGUMENT 

Various  reasons  can  be  cited  to  support  the  view  that  digital  control  should  be  used 
(Seabridge,  A.  G.,  1979;  Smith,  T.  B.,  et  al,  1978).  The  general  arguments  for  supporting 
the  implementation  of  digital  control  for  utility  systems  (INCOMS)  are  twofold,  namely* 

a)  Safety. 

b)  Efficiency. 

a)  SAFETY  improvements  can  be  realised  by* 

i)  System  integrity  improvements  through  increased  reliability  ar.d  the  more 
efficient  use  of  redundancy  techniques. 

ii)  Reduction  of  the  pilot  workload  as  a  result  of  increased  system  automation, 

iii)  Improved  communication  capability  (between  the  sub-systems) . 

b)  EFFICIENCY  improvements  are  expected  due  to: 

i)  Reduced  wiring  and  reduced  weight. 

ii)  Easing  of  the  maintenance  task  by  p-oviding  a  self-test  and  fault  diagnosis 

capability.  This  would  provide  status  indications  to  flight  and  ground  crews 
to  establish  pilot  confidence  in  correct  system  operation  (even  under  fault 
conditions).  It  would  also  reduce  the  maintenance  time. 

iii)  Reduction  in  the  initial  and  life  cycle  costs. 

lv)  Improved  System  Performance  leading  to  load  scheduling  and,  hence,  more 
efficient  use  of  available  power. 

A  further  argument  to  support  the  general  philosophy  is  that  the  introduction  of  digital 
control  allows  the  system  to  be  designed  so  that,  even  in  service,  but  particularly 
during  development,  advantage  can  be  taken  of  technological  advance.  In  addition  tho 
system  can  be  adapted  at  reasonable  cost  to  meet  changing  demands.  (Durkin,  H.,  1977). 

On  today's  aircraft  the  majority  of  control  changes  tend  to  be  hardware  orientated,  whereas 
in  INCOMS  it  is  anticipated  that  most  changes  can  be  implemented  in  software,  thereby 
reducing  the  impact  on  the  airframe. 

The  adoption  of  this  type  of  system  will  give  a  reduction  in  the  number  of  dedicated 
equipments,  thereby  reducing  overall  equipment  mass  and  volume;  and  should  promote  the 
design  and  use  of  common  items  of  hardware. 

INCOMS  will  have  a  significant  impact  on  the  crew-system  interface  and  thus  upon  the 
requirements  for  cockpit,  displays  and  controls.  A  fully  automatic  management  system  with 
the  ability  to  operate  without  significant  degradation  under  fault  conditions  will  reduce 
the  necessity  to  continuously  display  status  information.  A  digital  computing  system 
with  access  to  the  cockpit  via  MIL-STD  1553B  will  enable  the  pilot  to  communicate  with 
systems  via  non-dedicated  or  multifunction  switches. 


3.  EVOLUTION  OF  THE  INTEGRATED  CONTROL  OF  MECHANICAL  SYSTEMS  (INCOMS)  PHILOSOPHY 


The  systems  that  are  being  considered  in  the 

Engine  and  Associated  Systems 

*  Engine  Intake  De-Icing 

*  Engine  Speed  Signals 

*  Fire  Detection  and  Suppression 

*  System  Warnings 

Hydraulic  and  Associated  Systems 

*  Hydraulic  Utilities 

*  Hydraulic  Depressurisation 

*  Undercarriage  Control 

*  Flight  Refuelling  Probe 

*  Systems  Health  Monitoring 


global  term  "Utility  Systems"  are  shown  below: 


*  Engine  Starting  and  Ignition 

*  Thrust  Reverse  Control 

.*  System  Health  Monitoring 


*  Hydraulic  Control 

*  Brakes  and  Anti-Skid  Control 

*  Nosewheel  Steering 

*  Canopy  Control 

*  System  Warnings 


34-3 


Fuel  System 

*  Fuel  Management 

*  Re/Defuel  Transfer 

*  Fuel-Flow  Metering 

*  System  Health  Monitoring 

Oxygen  Supply  System 

*  Nuclear/Chemical/Biological  Protection 

Environmental  Control  Systems 

*  Cabin  Temperature  Control 

*  Rain  Dispersal 

*  Equipment  Bay  Cooling 

*  System  Health  Monitoring 

Secondary  Power  System 

*  Gearbox  Control 

*  Emergency  Power  Unit  Control 

Miscellaneous  Systems 

*  Cockpit  Lighting 

*  Landing  and  Taxy  Lights 

*  Windscreen  Heating  Control 

*  Seat  Adjustmant 


*  Fuel  Boost  Pump  Control 

*  Fuel  Gauging 

*  Hit  Detection  and  Suppression 

*  System  Warning:) 


*  Temperature  t  Pressure  Safety  Control 

*  Canopy  Standby  De-Mist 

*  Coolant  System  Control 

*  System  Warnings 


*  Auxiliary  Power  Unit  Control 


*  Electro -Luminescent  Panel  Control 

*  Anti-Col lisle  a  Lights 

*  Probe  t'-"  i.ig 

*  Arrestor  Hook 


The  systems  range  from  the  very  complex,  e.g.  Fuel  Management,  to  the  very  simple,  e.g. 
Arrestor  Hook,  The  one  thing  that  all  of  these  systems  have  in  common  is  that  they  must 
conform  to  the  alms  and  constraints  of  INCOMS. 

A  brief  description  of  the  "total  systems"  approach  that  has  been  adopted  at  BAe,  Warton 
and  at  RAE,  together  with  a  description  of  an  ideal  system  follows,  to  give  some  background 
to  the  technical  aspects  of  the  work  carried  out  to  date. 

Figure  1  shows  a  block  diagram  of  a  typical  integrated  avionic  system  envisaged  for  future 
aircraft,  but  with  the  Utility  systems  shown  as  they  exist  on  contemporary  aircraft  -  i.e. 
JAGUAR/TORNADO.  These  Utility  systems  have  individual  sets  of  components  and  control 
elements.  Only  five  control  elements  are  shown,  whereas  on  today's  aircraft  one  set  of 
control  hardware  would  be  expected  for  each  of  the  systems.  To  connect  the  individual 

systems  to  the  cockpit  would  require  a  considerable  amount  of  discrete  wiring  and  would  be 
contrary  to  the  general  policy  of  using  digital  data  transmission. 

To  avoid  this  situation,  all  the  control  elements  could  be  combined  into  a  single  block 
called  Utility  Systems  Management  (see  Figure  2) ,  and  that  block  connected  as  a  terminal 
onto  the  main  Avionics  Bus.  If  recognition  is  taken  of  the  distribution  of  Utility 
Systems  components  throughout  the  airframe,  it  will  be  seen  that  this  method  is  unacceptable 
for  at  least  three  reasons : 

a)  The  large  amount  of  discrete  wiring  involved. 

b)  The  concentration  of  wiring  at  the  central  block. 

c)  The  susceptibility  of  the  central  block  to  damage  or  failure. 

These  problems  can  be  overcome  by  the  Integrated  Control  of  Mechanical  Systems  (INCOMS) 
whose  system  consists  of  a  number  of  data  acquisition  and  control  devices  situated  at 
strategic  locations  throughout  the  airframe  (see  Figure  3) . 

This-  will  enable  components  local  to  the  devices  to  be  connected  to  the  most  suitable  (e.g. 
nearest)  device,  thereby  restricting  discrete  wiring  to  local  areas.  A  data  acquisition 
and  computing  sub-system  can  now  be  considered  which  consists  of  a  number  of  INCOMS 
processors  (the  current  work  at  BAe,  Warton  indicates  that  6  may  be  an  optimum  number)  which 
are  interconnected  via  a  MIL-STD  1553B  data  bus.  This  bus  is  in  turn  connected  to  the  main 
Avionics  Bus  via  Bus  Interface  Units  (BIFU)  which  can  also  act  as  Bus  Controllers.  This 
is  illustrated  in  Figure  4. 

The  interfaces  with  the  data  bus,  CPU  and  memory,  and  the  interfaces  with  the  mechanical 
system's  components  are  shown.  These  interfaces  will  allow  receipt  of  information  from 
discrete,  analogue,  digital  and  optical  devices/sensors,  and  power  control  interfaces  will 
allow  power  to  be  switched  to  devices  such  as  valves,  pumos,  etc.  In  an  ideal  system 
each  INCOMS  processor  would  be  hardware  identical,  with  its  individual  program  store 
taking  account  of  the  various  peripheral  components  in  each  INCOMS  processor’s  aircraft 
location. 


34-4 


i 


I 

i 


t 

\ 

i 

i. 


s 

i 

l 

! 


*L-i 


This  system  offers  a  minimum  hardware,  minimum  wiring  solution  that  can  be  envisaged  for 
a  future  combat  aircraft. 

Work  already  completed  by  British  Aerospace,  Warton  under  UK  MOD  funding,  tested  the 
above  philosophy  applied  to  a  possible  future  advanced  fuel  management  system.  The  aim 
of  this  work  was  to  define  a  suitable  overall  system  architecture  that  would  give  a 
management  system  in  which  greater  emphasis  is  placed  on  automation,  fault  detection/ 
tolerance  and  survival.  In  pursuance  of  this  a  number  of  possible  configurations  were 
considered . 

I ft  each  of  the  configurations  studied,  control  was  assumed  to  be  based  on  some  form  of 
digital  computing  system,  installed  in  a  twin  engined  aircraft.  The  configurations 
ranged  from  a  system  employing  discrete,  dedicated  wiring  between  components  and  computer 
(Kaye,  A.,  1979;  Moxey,  C.,  1980)  to  those  employing  a  distributed  processing  system  with 
a  serial  data  bus  connection  (Seabridge,  A.  G. ,  1580). 

Each  configuration  was  tested  against  a  representative  aircraft  layout,  and  it  was 
observed  that  some  of  the  earlier  configurations  presented  major  wiring  problems,  in  that 
they  were  complicated  to  install  and  also  left  major  sections  of  the  system  susceptible  to 
battle  damage.  To  overcome  these  basic  problems  the  later  configurations  subjected  the 
system  to  further  detailed  examination  in  order  to  determine  the  optimum  interconnection 
of  components  and  computing  system  elements  that  satisfy  the  conditions  of  fault 
tolerance  and  serviceability,  whilst  also  being  possible  to  install. 

In  order  to  meet  the  overall  integrity  requirements  of  the  systems  included  in  INCOMS  it 
was  considered  necessary  to  develop  a  method  to  define  the  relative  importance  of  these 
systems  and  prepare  from  this  a  Criticality  Analysis  and  Ranking  List  (CARL) . 

Information  thus  obtained  would  be  used  to  identify  if  a  need  exists  to  duplicate  data, 
to  cross-strap  from  sensors  to  computing  elements  by  hard  wiriny,  or  if  a  pre-set  datum 
needs  10  be  introduced  in  the  event  of  failure  of  primary  data  sources. 

To  this  end  four  methods  were  investigated,  each  taking  account  of  the  effects  of  systems 
failure  on  flight  or  mission  success,  in  both  peace  time  and  war  time  operation.  The 
first  three  methods  considered  were  all  discounted  (Lancaster,  P.A.,  1980).  By  adoption 
of  the  fourth  method  the  CARL  chart  as  shown  in  TABLE  1  was  compiled.  Briefly,  this 
method  was  developed  through  discussions  with  engineers  whose  experience  spanned  a  number 
of  aircraft  projects,  where  the  INCOMS  systems  were  broken  down  into  five  discrete  levels 
in  order  of  importance,  see  TAT>LE  II.  Also  taken  into  account  was  the  frequency  of  each 
system  operation  during  the  different  phases  of  a  missi-.-.r,  see  TABLE  III.  A  full 
description  of  this  analysis  may  be  found  in  Lancaster,  P.  A...  1980. 


SYSTEM 

RANKING 

SYSTEM 

RANKING 

» 

FUEL  SYSTEM 

130 

CRBIN  TEMPERATURE  CONTROL 

54 

HYDRAULICS-CONTROL 

125 

NOSE  WHEEL  STEERING 

52 

GEAR  BOX  CONTROL 

)  20 

DEPRESSURISPTIQN 

52 

IGNITION  MANAGEMENT 

105 

BRAKE/PNTI-SKID 

45 

E'.NG I NE  CONTROL  SERV 

100 

WINDSCREEN  HERTING 

28 

HYDRPULICS  UTILITIES 

100 

COCKPIT  LIGHTING 

28 

RPU  /EPU 

100 

THRUST  REVERSE 

22 

SYSTEMS  WPRNINGS 

95 

ARRESTOR  HOOK 

26 

OXYGEN 

92 

CANOPY  DE-MIST 

20 

U/C  CONTROLS  PND  IND. 

90 

HIT  DETECTION 

16 

EQUIP.  BPY  COOLING 

84 

HEALTH  MONT/MAI  NT 

15 

] 

N.  B.  C. 

80 

RAIN  DISPERSAL 

15 

1 

PIR  SYSTEM  CONTROL 

72 

CANOPY  CONTROL 

14 

CRB IN  PLTITUDE 

69 

ANTI-COLLISION  LIGHTS 

13 

1 

REFUEL  PROBE 

68 

NAV/OBST  LIGHTS 

12 

1 

PROBE  HERTING 

64 

E. L.  PANELS 

10 

j 

FIRE  DETECTION 

60 

uRND/TRXI  LIGHTS 

10 

j 

INTRKE  DE-ICING 
ENGINE  STRRT 

57 

55 

SEPT  ADJUSTEMENT 

3 

: 

) 


TABLE  I 


CRITICALITY  ANALYSIS  RANKING  LIST  (CARL) 


34-5 


LEVEL 

SYSTEMS 

1 

FUEL  MANAGEMENT 

ENGINE  START 

IGNITION  MANAGEMENT 
ENGINE  CONTROL  SERVICES 
GEAR  BOX  CONTROL 

APU/EPU 

HYDRAUL ICS -CONTROLS 

UNDERCARRIAGE 

NBC 

SYSTEMS  WARNINGS 

2 

FIRE  DETECTION 

HYDRAULIC  UTILITIES 
DEPRESSUR ISAT ION 

NOSE  WHEEL  STEERING 

PROBE  HEATING 

AIR  SYSTEM 

REFUEL  PROBE 

HIT  DETECTION 

EQUIP.  BAY  COOLING 

OXYGEN 

3 

INTAKE  DE-ICING 

BRAKES  AND  ANTI-SKID 
CABIN  ALTITUDE 

HEALTH  MON. /MA1NT.  RECORDING 
CABIN  TEMP  CONTROL 

4 

WINDSCREEN  1  EATING 
COCKPIT  LIGHTING 

CANOPY  DE-MI ST 

ARRESTOR  HOOK 

E.L.  PANEL 

THRUST  REVERSE 

5 

CANOPY  CONTROL 

RAIN  DISPERSAL 

LAND/TAXI  LIGHTS 

ANT I -COLL I SI ON  LIGHTS 

NAV. /OBST.  LIGHTS 

SEAT  ADJUSTMENT 

TABLE  II  -  CRITICALITY  LEVELS 


I  SYSTEM 

START 

TAXI 

TAKE 

-OFF 

CRUISE 

COMBAT 

APPR  & 
LAND. 

ENG.  loNI  I  ItJN 

ENG.  START 

ENG.  CONT.  SERV  (R.  P.  M.  ) 
ENG  CONTROL 

ENG.  INTAKE  DE-ICING 
THRUST  REVERSE 

FIRE  DETECTION  AND  SUPP. 
ENG.  GEAR  *,  X  DRIVE 
APU/EPU 

HVD.  UTL. 

H  t  D.  CONT. 

LEPRtSSUR I  SAT  I  ON 

BRAKES  AND  ANT I -SKID 
CANOPY  CONTROL 

U/C  CONT.  AND  INB. 

NOSE  WHEEL  STEERING 

FLIGHT  REFUELLING  PROBE 
FUEL  BOOST  PUMPS 

LP  COCKS 

RE/DEFUEL  AND  TRANS. 

FUEL  DUMP 

FUEL  GAUGING 

FUEL  FLOWMETERING 

HIT  DET.  AND  SUPP. 

CABIN  TEMP. CONTROL 

AIR  SYS.  CONT. 

RAIN  DISPERSAL 

CANOPY  UE-MIST 

EQUIP.  BAY  COOLING 

N.  B.  C. 

COCKPIT  LIGHTING 

E.  L.  PANEL 

LAND /TAX  I  LIGHTS 

ANTI -COLL  LIGHTS. 

NAV/O&ST  LIGHTS 

W/S  HEATING 

PROBE  HEATING 

SEAT  ADJUST 

SYSTEMS  WARNINGS 

HEALTH  MONT /MAI NT  REC. 
CABIN  ALTITUDE 

OXYGEN 

XXXXXX 

XXXXXX 

XXXXXX 

XXXXXX 

XXXXXX 

XXXXXX 

— 

— 

— 

— 

— 

XXXXXX 

XXXXXX 

XXXXXX 

XXXXXX 

XXXXXX 

XXXXXX 

— 

— 

— 

— 

KEY :  -  Continuous  Operation;  *****  Possible  Systems  Requirement 

XXXXX  Continous  Monitoring 

TABLE  III  -  SYSTEM  UTILISATION  DURING  A  MISSION 


34-6 


The  analysis  gives  an  indicator  of  the  integrity  targets  that  must  be  met  for  the 
individual  systems,  and  for  the  total  XNCOMS  uystera.  In  conjunction  with  the 
geographical  position  of  the  sensors  and  actuators  it  also  helps  in  the  allocation  of 
tasks  to  individual  processors.  As  a  result  the  number  of  interfaces  at  each 
Processor  can  be  defined. 


4 .  CONTROL  SYSTEM  DESIGN  REQUIREMENTS 

To  meet  the  requirements  of  the  Utility  Systems,  particularly  in  respect  of  reliability 
and  integrity,  without  imposing  intolerable  burdens  of  mass,  complexity  and  coat  requires 
careful  evaluation  of  alternative  approaches  to  system  design.  Zt  is  essential  to  think 
in  system  level  terms  at  the  outset  so  that  the  design  philosophy  encompasses  all 
components  and  requirements  of  the  system.  Some  of  the  special  features  and  constraints 
of  the  application  and  their  influence  on  the  major  components  of  the  control  system  are 
discussed  in  the  following  sub-sections. 

4 . 1  Control  Requirements  and  Characteristics 

It  is  convenient  to  categorize  the  control  tasks  in  Utility  Systems  on  a  hierarchical 
basis,  as  shown  in  TABLE  IV.  At  the  lowest  level  most  of  the  data  is  in  single  bit 
quantities  representing  physical  parameters  such  as  limit-switch  position  or  valve  open- 
close  command.  Additionally,  there  is  a  lesser  number  of  data  values  related  to 
analogue  quantities  such  as  temperature  and  pressure.  Processing  at  this  level  consists 
largely  of  packing  and  unpacking  words  for  computation  and  transmission,  evaluation  of 
Boolean  expressions  for  load  control,  and  the  generation  of  status  information  for 
individual  Utility  components  and  their  interfaces. 


SYSTEM  EXECUTIVE 

(System  funct i ons. statue,  bus  control) 

FUEL 

EXECUTIVE 

HYDRAULICS 

EXECUTIVE 

SUB-SYSTEM  EXECUTIVES 
(Mode  Set *ct i on.  Status  Monitor  in*) 

SEAT 

ADJUSTEMENT 

EXECUTIVE 

TRANSFER, 
ENGINE 
RE-LIGHT. 
QUANTITY 
CALC.  ETC. 

ENGINE 

START. 

PRESSURE 

SCHEDULING 

ETC. 

SUB— UYbTEtf  FUNCTIONS 

RAISE/ 

LOWER 

INTERLOCK 

VLV/PUMP 

CONTROL 

LEVEL; 

FLOWRATE 

SENSING 

ETC. 

VALVE 

CONTROL, 

FLUID 

LEVEL 

SENSING 

ETC. 

COMPONENT  LEVEL  FUNCTIONS 

POWER 

SWITCHING 

TO 

MOTOR 

TABLE  IV  -  CLASSIFICATION  OF  UTILITY  SYSTEM  CONTROL  TASKS 


At  the  next  level  the  basic  functions  or  operating  modes  of  the  Utility  sub-systems  are 
accomplished.  For  example,  in  the  fuel  system  some  of  the  modes  would  be  fuel  quantity 
computation,  refuel/defuel,  transfer,  engine  start-up,  engine  relight  and  system  status 
monitoring.  Most,  functions  at  these  levels  would  involve  logical  processes  and 
sequences  with  a  lesser  number  of  analogue  functions  such  as  control  of  braking 
deceleration.  Each  of  the  functions  may  either  be  fixed  or  one  of  a  set  of  related 
functions  to  cover  different  flight  regimes,  aircraft  status,  or  failure  modes. 

The  selection  of  Individual  mooes  vould  be  made  by  the  system  executive  in  response  to 
pilot  commands  and  status  data.  Practical  considerations  indicate  the  desirability  of 
splitting  this  executive  function  into  two  levels.  The  lowest  of  these  extends  as  far 
as  the  sub-system  boundary  so  that  a  vertical  structure  is  now  imposed  on  the  functional 
diagram  (TABLE  IV)  to  separate  sub- systems  into  distinct  modules.  This  both  eases  the 
design  process  and  improves  visibility  and  integrity,  and  accords  with  techniques  for 
structured  system  design.  The  main  functions  of  the  topmost  executive  level  would  now  be 
to  instruct  individual  sub-system  executives  as  to  the  desired  iode  of  operation,  handle 
data  flow  in  the  system  and  perform  global  functions  such  as  load  shedding  or  engine 
relight  which  require  action  in  a  lumber  of  sub-systems.  In  ex)  ;ting  aircraft  most  of 
the  executive  functions  are  performed  by  the  pilot,  in  additon  to  many  of  the  sequences 
and  some  of  the  individual  component  level  switching  and  monitoring  actions.  Replacement 


34-7 


by  computer  control  will  obviously  reduce  pilot  workload  significantly,  although  a 
system  which  leaves  the  pilot  without  control  is  neither  desirable  nor  acceptable. 

Means  must  be  provided  to  enable  the  pilot  to  monitor  system  operation  and  to  over-ride 
executive  functions  when  desired. 

The  processing  requirements  of  the  sub-systems  vary  considerably  from  the  complex  (e.g. 
fuel)  to  the  very  simple  (e.g.  seat  adjustment) .  In  the  case  of  the  simplest  the 
classification  into  function  level  may  appear  academic  since  only  minimal  processing  is 
required  at  each  level,  for  example  to  pass  a  discrete  signal  from  the  cockpit  to  the 
seat  motor.  However,  there  is  value  in  retaining  the  classification  for  consistency  in 
specification,  design  and  implementation. 

The  partitioning  described  suggests  that  distributed  processing  could  be  appropriate  on 
either  a  horizontal  (sub-system)  or  a  vertical  (function  level)  basis.  The  design  of  a 
system  which  assigns  groups  of  sub-systems  to  different  processors,  with  separate  units 
to  perform  the  System  Executive  function  is  described  later  in  section  5.2. 

4 . 2  Reliability  and  Integrity 

The  critical  nature  of  many  Utility  sub-systems  to  aircraft  safety  has  already  been 
underlined.  The  reliability  of  mechanical  and  electrical  components  demands  the  use  of 
varying  levels  of  redundancy  to  achieve  sufficient  safety  and  this  translates  into  the 
control  system.  A  simple  approach  would  be  to  determine  the  level  of  redundancy  required 
to  meet  the  most  critical  needs  and  apply  this  at  all  levels  of  the  system  from  processors 
to  interfaces.  However,  thi3  would  lead  to  significant  mass  and  cost  penalties  due  partly 
to  the  large  number  of  discrete  signal  sources  and  sinks.  Consider  the  basic  output 
interface  function  of  load  switching:  to  provide  dual  redundancy  against  open  and  short- 
circuit  failures  of  the  switch  elements  requires  them  to  be  quadruplicated,  as  shown  in 
Figure  8.  This  arrangement  must  be  used  because  Utility  components  do  not  in  general 
lend  themselves  to  a  redundant  configuration  similar,  for  example,  to  actuators  for  flight 
control:  separate  actuators  or  sensors  must  be  replicated  to  achieve  redundancy. 

Since  there  may  be  around  200-500  discrete  outputs  on  an  aircraft  the  total  mass  penalty 
could  be  considerable  with  current  solid-state  switches  having  a  mass  of  around  .04  to 
.20  Kg  depending  upon  rating.  N-fold  redundancy  requires  switch  elements  so  that  the 
application  of  redundancy  at  the  discrete  output  level  needs  to  be  tailored  to  the  needs 
of  individual  circuits  rather  than  applied  wholesale.  The  same  consideration  should  be 
applied  at  all  interface  circuits  since  studies  have  shown  that  they  dominate  the  mass 
and  complexity  of  the  local  control  units  owing  to  the  large  number  required. 

The  optimum  solution  for  the  Utility  Systems  would  be  to  accept  varying  levels  of 
redundancy  throughout  the  system  for  different  circuits  and  Bub-systems  and  at  the 
different  function  levels.  Many  mechanical  and  electrical  components  have  failure 
rates  of  the  order  of  10~4/hour,  and  for  their  related  interfaces  duplex  or  even  simplex 
redundancy  would  be  adequate.  However,  a  basic  principle  of  their  design  is  that  to 
reduce  the  probability  of  multiple  failures  individual  sub-systems  are  isolated,  hence 
higher  levels  of  redundancy  are  essential  for  processors  and  data  terminals  which  could 
cause  multiple  function  loss  upon  failure.  An  acceptable  level  of  total  control  failure 
is  unlikely  to  be  less  than  around  10-5/mission  and  studies  have  shown  (Collingbourne,  L.R., 
1981)  that  at  least  3-fold  redundancy  will  be  required  based  on  reasonable  estimates  of 
failure  rates  and  success  for  BIT  procedures.  At  the  local  terminal  level  lower 
redundancy  levels  could  be  acceptable  since  each  handles  only  part  of  the  system.  This 
aspect  of  design  involves  careful  analysis  of  Utility  System  operation  so  that  signals 
can  be  grouped  to  avoid  mutual  reinforcement  of  disaster.  This  would  occur,  for  example, 
if  control  of  wheel-brakes  and  reverse  thrust  were  obtained  from  the  same  terminal  but  not 
for  the  combination  of  fuel-level  and  reverse  thrust.  Finally,  use  can  be  made  of 
reversionary  modes  in  the  event  of  terminal  or  system  failures  so  that  loads  assume  a 
preferred  state,  e.g.  fuel  pumps  permanently  on. 

Assuming  that  Avionic  requirements  lead  to  the  use  of  a  duplex  data  bus  for  comnunication 
with  the  cockpit,  a  subsidiary  panel  and  data  link  providing  flight  safety  instruments 
and  controls  for  the  mos1-  vital  Utility  functions  is  likely  to  be  necessary.  This 
solution  could  be  preferable  to  penalising  the  main  Avionic  bus  with  requirements  for  a 
higher  level  of  redundancy. 

4 . 3  Data  System 

Provision  of  a  data  system  to  reduce  the  mass  and  complexity  of  the  cabling  linking  widely 
dispersed  Utility  components  has  been  shown  to  be  a  prime  requirement  for  future  designs. 
The  widespread  adoption  of  MIL-STD1553B  for  national  (e.g.  UK  Defence  Standard  OC-18 
(Part  2)/l)  and  international  standards  for  military  aircraft  is  a  powerful  argument  for 
its  adoption  for  Utility  Systems  and  most  of  the  work  in  the  UK  has  assumed  its  use. 

Studies  have  shown  (Wixcock,  G.  W.,  1978;  Moir,  I.,  1981)  that  for  a  centralised  system 

a  bus  loading  of  around  25-38%  is  likely  to  result.  Such  a  relatively  high  level 
supports  the  argument  for  provision  of  a  separate  Utility  System  bus  with  an  interface  to 
the  main  Avionic  and  cockpit  busses  so  that  traffic  on  the  latter  is  increased  only  by 
the  smaller  amount  needed  for  cockpit  control  and  display. 

A  loading  of  25-38%  is  acceptable  for  the  Utilities  bus,  although  since  these  values  were 
based  on  relatively  early  designs  lacking  many  of  the  higher  level  control  tasks  it  is  a 
factor  to  be  kept  under  careful  scrutiny.  A  distributed  processing  system  would  tend  to 
alleviate  problems  by  reducing  the  amount  of  low  level  data  on  the  bus.  The  need  for  a 


34-8 


bus-controller  in  MIL-STD-1553B  systems  tends  to  mitigate  against  a  distributed  approach, 
although  it  is  compatible  with  a  hierarchical  system  where  it  could  be  incorporated  at 
system  executive  level. 

A  practical  consideration  which  has  a  major  influence  on  system  design  i3  the  relative 
complexity  of  MIL-STD-1553B  terminals,  comparable  in  LSI  chip  count  and  failure  rate  with 
microprocessors.  The  implication  is  that  the  penalties  of  incorporating  intelligence 
into  terminals  in  terms  of  Increased  failure  rate,  mass  and  volume  will  be  small.  This, 
then.  Is  a  strong  indication  for  distributed  processing  based  on  microprocessors  within 
the  data  terminals. 


4 . 4  Software 

Although  the  basic  control  algorithms  for  individual  Utility  components  are  relatively 
simple  the  overall  control  task  which  must  allow  for  fault  detection,  corrupt  data  and 
automatic  control  in  a  system  with  of  the  order  of  bOO  inputs  and  outputs  Is  formidable. 
Practical  constraints  of  mass  and  complexity  do  not  permit  monitoring  of  mechanical  and 
electrical  Utilities  to  the  degree  necessary  to  uniquely  identify  all  possible  single 
faults.  Multiple  faults  are  less  common  hut  do  occur  and  could  be  extensive  following 
battle  deunage.  Loss  of  power  at  one  or  more  electrical  bus-bars  causes  widespread 
corruption  of  data  from  sensors  supplied  from  them.  The  task  of  coping  with  these 
problems  falls  largely  to  the  aircrew  at  present.  Automatic  control  must  attempt  to 
duplicate  to  some  extent  the  partly  intuitive  logic  processing  of  the  aircrew  to  be 
wholly  successful,  although  such  complete  control  must  await  advances  in  the  state  of  the 
art.  Nevertheless,  computer  based  system  monitoring  and  automatic  control  designed  on 
existing  knowledge  could  significantly  reduce  pilot  workload.  The  problem  still  remains 
that  software  must  cope,  without  failure,  with  corrupted  data  and  incompletely  defined 
system  status  magnifying  the  problems  of  software  debign  and  validation.  An  approach 

which  results  in  "resilient"  software  is  desirable.  An  assessment  of  th*a  problem 
'.Colli ngbourne,  L.  R.,  1980)  has  shown  the  value  of  the  appropriate  high-level  language 
and  a  structured  self-documenting  approach,  whilst  the  SAFRA  methodology  (Ward,  A.  O.,  1979) 
has  been  used  in  trials  which  show  that  the  method  leads  to  a  rigorously  obtained,  fully 
documented  requirement.  The  U3e  of  text  processing  and  automated  aids  for  data 
consistency  checking  used  in  this  method  make  it  a  valuable  aid  in  a  structured  design 
process.  A  particular  problem  in  the  Utiilty  Systems  is  the  large  number  of  autonomous 
simple  functions  which  would  lead  to  processing  inefficiency  if  each  were  modularized. 

A  possible  solution  is  to  group  simple  f motions  into  modules,  relying  on  the  self 
documenting  features  of  a  language  such  is  CORAL  66  to  achieve  integrity  through  visibility. 

A  distributed  approach  to  computing  would  ease  software  problems  by  reducing  the  size  of 
individual  programs,  although  the  prlvia.on  of  a  validated  distributed  executive  would  be 
a  critical  task.  The  MASCOT  system  (MMSCOT  Suppliers  Association,  1980)  for  multi-tasking, 
although  mainly  thougnt  of  for  use  on  centralised  systems  at  present,  could  be  applied  to 
distributed  processors  and  would  be  a  Vc  Luable  system  building  tool. 


5 .  SOME  APPROACHES  TO  SYSTEM  DESIGN 

Studies  are  currently  in  progress  tc  invef  igate  alternative  approaches’  to  system  design. 

The  optimum  choice  for  a  particular  application  is  affected  by  procurement  considerations  ar 
well  as  technical  factors  and  definitive  answers  have  yet  to  be  produced.  In  this  section 
the  two  basic  approaches  of  central!  -ed  and  distributed  processing  are  considered  with 
emphasis  on  the  latter.  It  was  com  idered  that  in  dealing  with  safety  critical  systems  a 
pessimistic  rather  *  han  an  optiir> .  stic  view  of  failure  was  appropriate,  hence  the  need  for 
relatively  high  levels  cf  redundancy  and  integrity  have  been  assumed.  With  practical 
experience  some  simplification  ir.iaht  be  achievable. 

5 . 1  Processing  Based  c-n  Parai  1  el  Redundancy 

Synchronized  parallel  redt -dan>y  j an  established  technique  for  detecting  and  isolating 
failures  with  nigh  confide  «s  oad  ntegrity.  Triple  modular  redundancy  (TMR)  is  the 
minimum  level  capable  of  m*-  .ing  t  e  requirements  of  system  failure  rate  and  fault  level 
capability.  The  architect  ire  of  .  possible  system  is  shown  in  Figure  5.  The  system  and 
bus  controllers  and  software  at  al:  function  levels  are  triplicated  and  linked  via  a 
triplex  bus  to  6  local  data  tern  laa Ls  adjacent  to  Utility  components.  Within  local 
units  the  data  bus  terminal  s  ana  fcr  erface  controllers  are  triplicated.  Discrete  Inputs 
are  cross-51  rapped  to  all  3  channel j  with  relatively  little  penalty  in  complexity,  but  it 
ls  sufficient  to  reduce  to  duplex  redundancy  in  individual  discrete  outputs  since 
availability  -*quir»  -rents  are  les:  than  at  terminal  or  processor  levels.  There  is  a  range 
of  possibilities  for  distribution  of  the  voting  and  data  consolidation  p>oints  within  the 
system.  That  shown  in  Fioure  5  enables  failures  within  the  interfaces  and  their 
controllers  to  be  isolated  -t  local  level  but  data  terminals  are  included  as  part  of  the 
bus  structure.  Consolidation  could  be  included  between  busses  and  terminals  increasing 
system  availability  at  the  expense  of  added  complexity. 

A  disadvantage  oi  TMR  is  the  prosper-*  of  common-mode  failure  due  to  causes  such  as  EMC, 

EMP,  common  servicing  defects,  generic  software  failures,  propogation  of  electrical  faults, 
or  by  failure  of  BIT  arbitration  after  2  failures.  The  relatively  long  time-constant  of 
most  Utilities  should  enable  recovery  from  transient  effects  such  as  EMC.  Electrical 
isolation  could  be  achieved  using  fibre-optic  links  between  channels,  although  at  some 


34-9 


penalty  in  complexity.  An  alternative  approach  which  is  particularly  appropriate  since 
mechanical  and  electrical  systems  are  already  largely  based  on  a  two  channel  structure 
(designated  left  and  right  channels)  is  to  use  a  dual  duplex  system,  as  shown  in  Figure  6. 
Isolated  left  and  right-hand  dual  busses  are  used,  with  duplex  redundancy  within  each 
system  and  bus  controller,  and  within  local  terminals.  Comparison  of  channels  is  used 
to  detect  failures,  with  BIT  to  determine  the  faulty  channel.  The  failures  are  detected 
with  high  confidence,  although  the  probability  of  incorrect  arbitration  is  increased 
relative  to  TMR.  However,  any  defects  in  this  area  are  confined  to  one  half  of  the 
system  so  that  overall  integrity  and  availablility  requirements  are  met.  Assuming  the 
need  to  communicate  with  a  duplex  Avionics  bus  dual  links  to  each  duplex  Utility  bus  are 
needed  so  that  a  single  fault  does  not  result  in  loss  of  function.  Fibre  optic  links 
are  desirable  to  preserve  electrical  isolation. 

Both  of  the  systems  described  are  complex  in  terms  of  both  the  number  and  capabilities 
of  the  functional  units.  Although  only  3  processors  are  required  for  TMR  they  must  each 
have  the  capability  to  handle  total  system  and  bus  control.  Mission  reliability  could  be 
degraded  using  TMR  since  missions  may  have  to  be  aborted  following  single  failures  since 
further  ones  could  hazard  flight  safety.  In  the  dual  duplex  arrangement  vulnerability 
considerations  are  a  strong  indication  for  separation  of  the  processors  in  each  duplex 
channel,  increasing  the  number  of  boxes  and  complexity  since  bus  terminals  would  have  to 
be  duplicated  for  each  processor  rather  than  for  pairs  of  processors. 

5. 2  Distributed  System 

The  dual  duplex  system  represents  a  stage  in  the  progression  to  distributed  systems  since 
it  effectively  separates  processing  between  left  and  right-hand  channels  each  controlling 
only  half  of  the  Utility  System.  The  next  stage  is  to  incorporate  local  processors  within 
the  already  distributed  data  terminals.  Although  the  system,  shown  in  Figure  7, 
resembles  the  dual  duplex  system  symbolically,  its  characteristics  and  operation  are  quite 
distinct.  Its  design  is  based  on  the  partitioning  scheme  described  earlier.  There  are 
practical  limits  on  inter-system  (horizontal)  partitioning  since  it  would  be  wasteful  of 
resources  to  provide  individual  processors  for  the  many  sub-systems  involved,  i.e.  30-40, 
nor  would  it  be  appropriate  in  view  of  their  widely  differing  processing  requirements. 

A  preferred  approach  would  be  to  distribute  groups  of  sub-systems  to  different  groups  of 
processors,  the  redundant  functions  of  each  sub-system  being  partitioned  between  different 
processors  in  a  group.  Allocation  to  a  particular  processor  would  involve  considerations 
of  component  layout,  processor  utilization,  and  reliability  requirements.  Separate 
processors  are  shown  to  perform  the  function  of  the  System  Executive,  with  bus  control 
(assuming  a  MIL-STD-1553B  based  data  system)  and  communication  with  the  duplex  Avionics 
bus  through  the  Bus  interface  Unit  (BIFU)  as  appropriate  subsiduary  tasks.  Alternatively, 
a  distributed  executive  could  be  incorporated;  although  this  would  minimise  the  number  of 
processors  it  would  impose  additional  requirements  for  processing  and  storage  on  those 
remaining  and  might  lead  to  a  nett  penalty. 

A  dual  duplex  Utility  bus  arrangement  is  used  to  increase  integrity  through  isolation, 
preserved  by  fibre-optic  links  to  the  BIFU.  These  links  could  also  be  used  for  inter- 
processor  communication  at  executive  level  which  would  increase  resilience  to  failures  in 
both  the  control  system  and  Utility  components  by  facilitating  interchange  of  status 
information  and  rescheduling  of  tasks. 

Individual  sub-system  tasks  such  as  fuel,  air  system  control  or  brakes  are  assigned  to 
groups  of  local  terminals  according  to  the  level  of  redundancy  required.  In  the  event 
of  local  failures  the  tasks,  including  the  sub-system  executives,  are  re-assigned  to  a 
different  terminal.  In  the  event  that  this  results  in  excessive  loading  on  the  remaining 
terminals  the  tasks  would  be  executed  on  a  priority  basis,  resulting  in  a  gradual 
degradation  of  system  performance  and  response  time.  This  is  preferable  to  abrupt 
failure,  particularly  in  permitting  time  for  corrective  pilot  action,  for  example  changes 
of  flight  regime  or  aircraft  configuration.  Since  many  Utility  sub-systems  are  only 
required  in  particular  areas  of  the  flight  regime  they  lend  themselves  to  this  approach. 
Rescheduling  of  tasks  could  be  controlled  by  the  system  executive  processors,  but  in  the 
event  of  failure  here  a  task  rescheduling  protocol  could  be  activated  based  on  a 
distributed  executive  in  local  processors.  This  would  also  require  a  local  executive  to 
take  over  as  bus  controller,  perhaps  actuated  after  cross  monitoring  detects  a  failure. 
Obviously  a  new  approach  to  data  transmission  which  did  not  require  a  bus  controller  and 
which  lent  Itself  to  the  protocols  of  a  distributed  system  would  be  advantageous  in 
simplifying  local  units  and  software,  and  improving  system  reliability. 

It  is  unlikely  to  be  acceptable  for  single  failures  to  lead  to  loss  of  control  over  the 
relatively  large  number  of  loads  connected  to  a  terminal,  thus  the  use  of  dual  redundancy 
is  indicated  at  processor,  data  terminal  and  bus  level  for  both  the  left  and  right 
channels.  A  possible  terminal  architecture  is  shown  in  Figure  9.  Although  the  number 
of  processors  is  now  increased  their  required  capability  has  been  reduced  by  distribution 
and  it  is  probable  that  suitable  single-chip  devices  will  be  available  in  the  time-scale 
of  need.  Since  the  mass  and  complexity  of  the  local  units  has  been  shown  to  be  dominated 
b.  the  large  number  of  interface  circuits  for  the  Utility  components  the  practical 
penalties  of  the  microprocessors  will  be  small.  To  reduce  the  number  of  separate  boxes 
to  be  installed  the  system  executive  processors,  shown  as  stand  alone  units  in  Figure  7, 
could  be  integrated  with  local  processors. 

The  methods  used  to  identify  failures  at  each  level  are  critical  design  factors.  Failures 
of  Utility  components  and  interfaces  could  be  detected  by  their  related  local  processor 


34-10 


using  techniques  such  as  loop  tests,  data  reasonability  checks,  and  comparison  with 
effectively  redundant  data,  for  example  comparison  of  gauged  fuel  quantity  with  calculated 
values  based  on  integration  of  flow-rate.  Processors  could  similarly  test  their 
associated  data  terminals.  Failure  detection  for  processors  and  data  terminals  involves 
comparison  of  lanes.  Single  faults  can  thus  be  detected  with  high  confidence  and  BIT 
action  initiated  to  determine  the  failed  unit.  The  effectiveness  of  BIT  arbitration 
could  be  improved  by  involving  processors  in  other  terminals  to  test  the  faulty  terminal, 
since  cross-monitoring  in  those  terminals  provides  high  confidence  of  their  correct 
operation.  The  involvement  of  other  terminals  enables  a  voting  procedure  to  be 
introduced  without  the  penalties  of  increasing  the  redundancy  level  within  individual 
terminals . 

It  can  thus  be  seen  that  the  quoted  advantages  of  distributed  processing  such  as  high 
integrity,  cost-effective  use  of  processor  resources,  reduced  vulnerability  to  battle 
damage  and  more  gradual  and  easily  contained  degradation  under  fault  conditions  should  be 
obtainable  in  the  application  to  aircraft  mechanical  and  electrical  systems.  Such  systems 
have  charactf istics  which  lend  themselves  to  the  distributed  approach,  the  major  factors 
being: 

a)  Mechanical  and  electrical  components  are  widely  distributed  in  the  aircraft. 

b)  Tasks  are  largely  autonomous  at  the  lower  functional  levels. 

c)  The  integrity  and  reliability  requirements  of  different  sub-systems  vary  widely  so 
that  a  parallel  redundant  approach  based  on  the  most  stringent  requirements  is 
inefficient. 

d)  Most  sub-systems  can  tolerate  relatively  long  breaks  in  service  enabling  task 
rescheduling  and  reconfiguration  to  be  achieved  using  resources  normally  performing 
lower  level  functions.  The  rapid  response  of  synchronous  parallel  redundant 
systems  is  not  required. 

e)  The  wide  range  of  sub-system  priorities  and  the  capability  to  rank  them  according  to 
flight  regime  is  well  suited  to  a  distributed  processing  scheme  in  which  efficient 
use  is  made  of  processing  resources  combined  with  a  progressive  reduction  in  system 
capability  under  failure  conditions. 

6.  BENEFITS 

The  work  to  date  has  been  based  largely  on  the  system  shown  in  Figure  4.  A  comparison 
between  this  system  and  mechanical  systems  control  on  a  current  high  technology  aircraft 
(TORNADO)  has  highlighted  the  following  quantifiable  benefits: 

a)  A  mass  reduction  in  the  order  of  50%  of  control  system  hardware  and  wiring  which 
represents  approximately  10O  Kg  in  real  terms. 

b)  A  volume  reduction  in  the  order  of  30%  of  control  hardware. 

c)  A  reduction  in  pilot  workload  of  about  4:1  (based  on  analysis  of  an  emergency 
situation  and  the  modelling  of  a  twin  engine  start  routine) . 

Figure  10  shows  these  results  compared  with  claims  found  in  other  published  work.  This 
figure  also  shows  claims  for  improvements  in  reliability,  maintainability  and  survivability. 
These  last  three  parameters  have  not  yet  beer,  addressed  in  the  INCOMS  work,  but  it  is 
anticipated  that  future  work  will  obtain  similar  results. 

Further  analysis  of  INCOMS  will  yield  results  that  will  enable  us  to  establish  reliability 
figures . 

Advanced  Health  Monitoring  and  Maintenance  recording  techniques  that  are  being  considered^ 
in  the  Utility  Systems  should  dramatically  improve  the  Maintenance  turn-round  time. 

Improved  fault  detection  capability  and  the  ability  to  reconfigure  will  improve 
survivability  aspects. 

7.  CONCLUSIONS 

Various  techniques  for  the  application  of  digital  control  to  Utility  Systems  have  been 
investigated  in  this  paper.  It  has  been  shown  that  the  preferred  approach  utilises  a 
number  of  distributed  processors  and  terminals  that  interface  with  the  Utility  components. 
Analysis  performed  to  date  shows  that  significant  reductions  in  mass  and  pilot  workload 
can  be  achieved. 

Further  work  is  required  to  refine  the  System's  design  and  to  assess  other  potential 
advantages  of  adopting  INCOMS. 

ACKNOWLEDGEMENTS 

This  work  has  been  carried  out  with  the  support  of  Procurement  Executive  MOD.  Where 
opinions  are  expressed  they  are  those  of  the  authors.  The  encouragement  of  the  Director, 
Royal  Aircraft  Establishment,  and  by  the  Directors  of  British  Aerospace,  Warton  Division, 
is  acknowledged,  as  is  the  generous  assistance  of  the  authors'  colleagues. 


Collingbourne,  L.  R.,  1980: 

"Application  of  Digital  Computer  Control  to  Aircraft  Electrical  Systems" 

TR80095  Royal  Aircraft  Establishment,  Farnborough,  UK. 

Collingbourne,  L.  R. ,  1981: 

"Reliability  and  Integrity  Aspects  of  Digitally  Controlled  Aircraft  Utility  Systems" 
Report  to  be  published.  Royal  Aircraft  Establishment,  Farnborough,  UK. 

Durkin,  H.,  1977: 

"Some  Engineering  Problems  in  the  RAF" 

AGARD-R-653. 

Kaye,  A.,  Swetman  R.  E.,  1979: 

"Advanced  Fuel  Management  System  Study  -  Progress  Report  1" 

BAe  Report  TNAM  3343. 

Lancaster,  P.  A.,  Moxey  C.,  1980: 

"Study  of  Systems  Management  for  Future  Military  Aircraft  -  Volume  2" 

BAe  Report  TNAM  3374,  Volume  2. 

MASCOT  Suppliers  Association,  1980: 

"The  Official  Handbook  of  MASCOT" 

Royal  Signals  and  Radar  Establishment,  Malvern,  UK. 

Moir,  I.,  Moxey,  C.,  Lancaster,  P.  A.,  1981: 

"Command  Response  Data  Transmission  Applied  to  Mechanical  Systems  Management.  Effect 
on  the  Crew  System  Interface" 

AGARD  32nd  Symposium  of  the  Guidance  and  Control  Panel. 

Moxey,  C.,  Wilson,  P.,  1980: 

"Advanced  Fuel  Management  System  Study  -  Progress  Report  3" 

BAe  Report  TNAM  3371. 

Ohlaber,  J.  I: 

"Aircraft  Electrical  Multiplex  System" 

IEEE  Intercon  Conference,  March  1973. 

Ohlhaber,  J.  I: 

"An  Integrated  Systems  Approach  to  Helicopter  System  Design  using  Redundant  Multiplexing 
Techniques" 

5th  European  Rotorcraft  and  Powered  Lift  Aircraft  Forum,  September  1979. 

Ohlhaber,  J.  I: 

"Integrated  Multiplex  for  the  Augusta  A-129  Attack  Helicopter" 

6th  European  Rotorcraft  and  Powered  Lift  Aircraft  Forum,  September  1980. 

Rice,  C.  I.,  1977: 

"Avionic  Solutions  to  Future  Requirements" 

Interavia  News  Letter  No.  8888. 

Roth,  S.  P.,  Miller,  R.  J. ,  Mihaloew,  J: 

"Future  Cnallenges  in  V/STOL  Flight  Propulsion  Control  Integration" 

SAE  Aerospace  Congress,  October  1980. 

Seabridge,  A.  G.,  1980: 

"Advanced  Fuel  Management  System  Study  -  Final  Report" 

BAe  Report  TNAM  3382. 

Seabridge,  A.  G. ,  1979: 

"A  Proposed  System  Management  Centre  for  Future  Military  Aircraft" 

IEE  3rd  Int.  Conference  on  Trends  in  On-Line  Computer  Control  Systems. 

Smith,  T.  B.,  et  al,  1978: 

"A  Fault-Tolerant  Multiprocessor  Architecture  for  Aircraft" 

NASA  CR3010. 

Ward,  A.  0.,  Forsyth,  D.  Y.,  1980: 

"SAFRA:  Controlled  Requirements  Expression" 

BAe  Report  TNAAS  484. 

Wilcock,  G.  W. ,  1978: 

"Evaluation  of  a  Data  Transmission  System  Based  on  MIL-STD-1553A  for  Control  of 
Electrical  Power  Distribution  in  Aircraft" 

TR78137  Royal  Aircraft  Establishment,  Farnborough ,  UK. 


COCKPI  t 


FIGURE  2 


UTILITY  SYSTEMS  MANAGEMENT  WITH  CENTRALISED  CONTROL 


Utility  component* 


Utility  components 


Fig  7  Distributed  processor  control  system 


System  executive  processor  Sub -system  control  terminat 


Control  signals  p»n>l 


Fig  8  Redundant  discrete  output  controller 


Fig  9  Terminal  architecture  (or  a  distributed 
processing  system 


Equimant « Wiring  Mmi 


Equipment  Voluma 


Pilot  Workload 


Rallebllity  Improvamant  Maintananca  Manhours  Survlvaability  Improvomant 


FIGURE  10  -  MEASURES  OF  EFFECTIVENESS 


KEY 


A:  Ohlhaber ,  J.  I.,  1980  B:  Ohlhaber,  J.  I.,  1973 

C:  Ohlhaber,  J.  I.,  1979  D:  Roth,  S.  P. ,  1980 

E:  Rice,  C.  I.,  1977 


1  is  current  situation 


I  denotes  INCOMS  results 


ARCHITECTURE  DU  SYSTEME  D 1 ARMES  DU  MIKAGE  2000 


35- 


S.  Croce-Spinelli 
B.  Vandecasteele 
J.F.  Ferrari 

Avlons  Marcel  Dassault-Breguet  Aviation 
78,  Quai  Carnot 
92214  Saint  Cloud 
France 


RfsiimS 


L' architecture  du  Systftme  d'Armes  du  MIRAGE  2000  represents  une  generation  avancSe  de 
systems  numSrique.  Ella  eat  dCcrits  des  points  de  vue: 

-  des  Squipements  numfiriguee 

-  de  la  repartition  des  logiclels  entre  ces  fiquipements 

-  des  liaisons  numSrigues 

-  de  la  surveillance  du  systems  en  vol. 

On  analyse,  les  principes  qui  ont  servi  de  base  8  la  conception,  le&  mfithodes  de  d6ve- 
loppement  et  de  test  n6cessaires. 

On  montre  la  flexibilltfi  inhdrente  gui  permet  de  s' adapter  8  diffSrentes  exigences 
op4rationnelles  et  8  diffCrentes  versions  possibles. 


35-2 


ARCHITECTURE  DU  SYSTEME  D'ARME  DU  MIRAGE  2000 


1.  INTRODUCTION 

Lfe  Systftme  d'Armes  du  MIRAGE  2000  est  pris  coinme  exemple  d'une  generation  nouvelle  de 
sy^tAroes  num6riques. 

II  est  intfereabant  de  montrei:  lea  prlnclpes  qui  ont  servi  &  sa  conception  et  comment 
ces  prlnclpes  sont  a  la  base  de  toute  une  famllle  de  systAmes.  Cette  famllle  r§sulte 
de  la  flexibility  inherente  a  ces  principcs,  flexiblllte  qul  permet  alnsl  de  crCer 
aisement  des  solutions  de  dtt'fe rentes  "tallies'  <s.dapt6es  a  dtffCrentes  exigences 
operationnelles  sans  n6cessiter  a  chaque  Lois  des  ir.vestissements  trop  importants. 

Ces  prlnclpes  d' architecture  recouvrer.t  a  la  fois  des  aspects  "materlels"  ,  en  particu¬ 
lar  tout  ce  qui  concerne  les  liaisons  numeriques  entire  les  calculateurs,  et  des 
aspects  "logiciels",  en  particuller  ra  repartition  des  tiches  entre  les  calculateurs 
et  les  interfaces  correspondantes. 

A  cette  generation  d ' architecture  sont  associees  de:.  infethode  de  i  eveloppement  qu'il 
a  fallu  mettre  au  pomt  pour  prendre  en  comp1  :  la  complexity  et  la  vartete  des  tflches 
que  les  systSmes  correspondents  sont  en  nesuie  d-~  r6aliser. 

Dans  cette  famllle  de  systfemes  on  pe'  *  c'asser  c  -s  realisations  nomb’-euses  d6ja  en 
production  ou  en  cours  de  d€veloppo  *-  t  par  no  e  Soc  ietC. 

0r>  peut  grosslSrement  car  s.  r  a  archltei  r ure  oar  .  'utilisation  des  liaisons 

num6riqust,  multiplexes?  pe  "PI  us"  nmt  u/>  memo  standard  a  6td  utilise  pour 

1' adronautlque  milita:  r*  :r  igaise  depuis  974  et  qui  est  aussi  u  1  se  pour  de 
nombreux  autres  be^oir  essentlel  oment  mi] 1 1  lires) . 


2.  PRESENTATION  Gf.r--.-iLE  Df  STEME  D'.  MIRAGE  2000 

Le  Gystdme  d'Anr  di  MIRA  .E  200:'  import,  essentiel  mant  det  -^quipemi  nts  numeriques 
programmable ,  c  t  la  tructure  nterne  est  a  aptSe  aux  besoins  specliiques  de 
traltement:  tail.c,  me  ire  -  Lt<  use  do  trait  «_-nt  du  processeur,  specialisation 
dventuelle  du  process^  r  .<-.s  org  dt  ilculs  organises  en  "multiprocesseu . s" ) . 

Sur  le  s  no  ;  iquo  ’‘■ndral  ‘u  sys‘ *  \e  on  peut  distinquer. 

•s  principal)*  c  pteurs 

-  .  s  visualisations  it  les  commandos 

Us  ca 1 culctours  ccntraux  qui  rda’lsent  notammer c  une  gestion 
centra it. sde  <  i  systSme 

-  les  i  ganes  d  Interface 

-  le  Digibus. 


Les  capto urs 

On  pei  citer  ius  particu] u&remem 


La  cent! alt 

1eV  ":*<4lc  a 

que  ]  ’  atta. 


Iner- ie  qui  a  Ste  adoptee  pour  son  utlllte  dans  tous 
avion,  tussi  bitn  la  navigation ,  que  1' air-air  ou 
lr-sol.  La  ceot.rale  comporte  son  propre  calculateur. 


Lc  ra'ar  qui  'ssSde  ogalement  des  functions  air-air  comme  des 
foridions  air-^ol  et  naviqatlon.  et  dont  ia  "taille"  rdsulte 
principal  -ment  d<s  besoins  en  porte*  pour  lcr  intercectlons  air-air. 
Le  radar  fait  la  plus  large  part  au  craitement  numdrique  du  signal 
et  posstde  egale:*nt  une  unite  ar ithm6txque  programmable  assurant 
la  g  stlon  du  fcrc-ttonnament  du  radar,  los  calculs  et  les  dchar.ges 
a^  ><-  U  reste  du  systSme. 


Le  SystSme  e-  t  Sgalement  capable  de  recevoii  d'autree  capteurs  de 
type  electee  optlque  par  example  (en  pods) ,  qui  sont  relies  par  le 
Dig ibus . 


.  Let  col,  trsnesures  passives  (et  actives)  egalemont  numeriques 
Las  visualisations  les  commande s 


Us  viaualisations  sont  du  type  catodique.  La  t§te  haute  possftde  un 
chaaip  partieuliSrement  important,  pour  permettre  un  confort  maxi mum 
de  pilotage  dans  toutes  les  phases  de  mission  et  pour  permettre 
Sgslememt  les  vlaSes  qui  se  font  dans  les  dlfferentes  conduites  de 
■  . .<■  aver  des  hausses  tres  dlfferentes.  On  peut 


ausai  signaler  que  lea  angles  d' incidence  pratiques  par  le  MIRAGE 
2000  aont  tris  importance  et  que  cel a  auasi  entralne  la  necessity 
de  champs  AlevAs;  paz  example,  des  hautea  incidenoea  sont  utilisAes 
pour  obtenir  des  vitesaes  d'approche  faiblea .  La  tdte  basse,  trich- 
rome  comporte  la  possibility  de  presenter  simultantaent  des  images  au 
standard  TV  et  une  symbologie  cava litre .  Le  gftntrateur  de  symboles 
est  numArique  et  comporte  une  unite  aritns4tique  de  type  univerael. 

II  existe  Agalement  une  visualisation  cathodique  special lsAe  pour 
les  contremesures . 

Par  ailleurs  quelques  instruments  A  indication  analogique  auhsiatent; 
ils  aont  parfois  nuraAriquea. 

.  Lea  commandea  ccmportent  principa lament: 

-  des  commander  "temps  reel”  situees  sur  la  manette  et  sur  le  aanche 
de  pilotage;  elles  permettent  toutes  les  actions  neceasalres  dans 
les  phases  critiques  des  missions. 

-  un  poste  de  selection  des  armes  et  des  modes  entiferement  pilote  par 
logiciel. 

-  d'autres  poates  de  commande,  numAriques,  tela  que  le  poste  de 
commande  de  navigation. 

-  Les  Calcul  >.teurB  centraux 


Ces  deux  calculateurs  se  partagent  de  nombreuses  tSches  de  calculs 
associAes  aux  missions  de  l'avion,  en  liaison  avec  tous  les  autres 
equipements  numeriques.  Ils  assurent  notamment  la  gestion  centraliaAe 
du  systAme  cotnme  on  le  verra  plus  loin.  En  redondance,  ils  gfirent  les 
echanges  numeriques  sur  le  Digibus. 

-  Les  Organes  d‘ Interface 

On  peut  ranger  dans  cette  categoric  les  circuits  armament  et  leB 
Interfaces  avec  les  missiles. 

On  peut  y  ranger  le  pilote  automatlque  qui  assure  la  liaison  entre  lea 
capteuro  du  systAme  et  les  commandes  de  vol  eiectriques  pour  rAaliser 
les  modes  de  base  et  lea  moHos  supArieurs  de  pilotage  automatlque. 

Rentre  Agalement  dans  cette  catAgorie  le  boltier  de  compatibility 
contremesures . 

-  Le  Dlqlbus 

On  remarque  que  tous  ces  Aqu.tpements  numeriques  qui  sont  represents 
sur  le  Synoptiqv.n  sont  relies  entre  eux  par  la  liaison  standard  de 
pe  *Diglbus”,  dont  la  gestion  est  assurAe,  comme  on  vient  de  le  dire, 
en  mode  normal  par  l'un  des  deux  calculateurs  centraux,  en  mode  secours 
par  1* autre. 

A  ce  systAme  entiArement  numArique,  il  faut  ajouter  des  moyens  de 
radiocoimnunication ,  de  radionavigation  et  d' identification.  D 'ailleurs 
certains  de  ces  Aqulpements  font  un  large  appel  aux  techniques 
numAriques  (IFF,  TACAN...). 

3.  PHILOSOPHIE  DE  REPARTITION  DES  TACHES  DE  CALCUL 

Lorsque  l'on  est  passS  progressivement  de  syBtAmes  hybrides  qui  ne  comportaient 
au  depart  qu'un  seal  calculateur  numArique,  aux  syBtAmes  de  cette  generation  qui 
en  comportent  un  grand  nombre,  il  a  fallu  rAflAchir  A  la  maniAre  de  rApartir  les 
calculs,  repartition  qui  fixe  corrAlativement  les  interfaces  et  les  echanges  entre 
les  calculateurs. 

Un  certain  nombre  de  critAreB  apparaissent  clairement  pour  ces  localisations;  ?e 
sont  les  crltdres  techniques  lies  aux  caracteristiques  des  calculateurs  (volumes 
memoires  et  vitesse  de  colcul)  et  des  liaisons  (debit  maximum  d'Achange  sur  le 
digibus) .  Il  faut  se  rappeler  en  effet  qu'on  a  affaire  A  des  systAmes  "temps 
reel”  dans  lesquels  les  contraintes  temporalles  sent  es.sentlelles  (cadences 
d'echantillonage  ou  de  calcul,  synchronisation  et  datation,  pildtabilitA. . . ) . 

On  n'insistora  pas  plus  sur  ces  critAres  qui  conditionnent  au  premier  chef  la 
faisability  de  toutes  les  solutions  nvisageables. 


35-4 


Par  contre  il  y  a  d'autres  critferes  qul  sont  moina  evident s:  crltftres  humains,  indus- 
triels,  logistlques. . ,  et  pour  lesquels  nous  pouvonB  donner  quelques  exemples. 

Nous  pouvons  distinguer  plusieurs  categories  de  calculs  3  1* inter teur  du  systSme: 

a)  Les  fonctions  " autonomies " .  Ce  sont  des  fonctlons  specialises  en  rapport  direct 
avec  une  technique  particullSre.  Nous  serons  encore  plus  clalrs  en  donnant 
trois  exemples: 

-  la  fonction  "capteur"  du  radar  qui  conoiste  3  extraire  les  informations  de 
position,  Vitesse,  acceleration. . .des  cibles. 

la  fonction  'capteur"  de  la  centrale  inert  elle  iui  permet  de  mesurer  la 
position,  let;  attitudes,  le  cap...de  1' avion. 

la  fonction  ‘trace  de  symboles"  du  qSr.ferateur  de  symboles  de  la  tete  haute 
et  ce  la  tete  basse . 

On  peut  noter  que  ces  fonctions  son!  assez  bien  irfcdtpendantes  les  unes  des 
autres  d'oil  la  denomination  d’autonomes  (pour  certalnes  fonctions  on  parle 
aussi  de  fonctions  autonomes  "assiste*»s") . 

Ces  fonctions  peuvent  se  mettre  au  point  s£par6ment;  el les  n*cessitent  meme 
souvent  une  mise  au  point  prealable  avant  d'etre  int**grees  dans  un  systSme 
d'armes  complet  3  cause  des  probiemes  techniques  et  technoiogiques  qui  y 
sont  associes. 

Ces  fonctions  font  appel  3  des  equipes  de  specialistes  chez  des  fabricants 
qui  ont  accumuie  une  experience  dans  ce  domaine  d ' Squxpements  et  dans  leur 
maintenance . 

Dans  la  technologie  numSrique  actuelle  qui  met  3  disposition  de  tcms  les 
composants  de  base  faciles  3  utillser  (encombrement  ct  rronstsnsnation  faibles.. 
outils  logiciels  de  base  d6ve loppfcs . . . )  ,  ces  fonctions  aolnent  fibre  incorporfies 
dlrectei.ient  dans  lea  equipements  correspondents  (radar,  cerrtrale  inertlelle .  .  . ) 
alors  que  dams  le  passe  on  a  pu  imaginer  de  les  centraliser  dans  un  seul 
calculateur . 

b)  Les  fonctions  "lntfigrfiea? 

Un  systfime  ne  s'identlfle  pas  3  la  somme  dos  fonctions  autonomies  que  l'on  vient 
de  dScrlre.  Il  faut  y  ajouter  un  certain  nombre  de  fonctions  de  cooperation  et 
de  synthfise  que  l'on  a  appeieeo  "integrees".  Elies  permettent  de  erfier  les 
guldages  de  navigation  et  les  condultes  de  tir  au  sens  large. 

Donnons  quelques  exemples: 

Calculs  des  lois  de  navigation  pour  1'attaque  des  cibles  afiriennes  qui 
utllisent  les  donn6es  de  tous  les  capteurs ,  3  commencer  par  le  radar  bien 
stir,  qui  aont  fonction  de  l'arme  seiectionnee  et  peuvent  feventuellement 
fivoluer  avec  1 'experience  opfirationnelle ; 

Calculs  des  domaines  de  tir  des  divers  n.'ssiles  air-air; 

Calculs  de  tir  canon  air-air,  par  exemple  calnul  de  la  ligne  de  traceurs  ou 
calculs  3  prediction; 

Calculs  de  balistique  pour  les  antes  air-sol  conventionnelles; 

-  Gestion  des  sequences  de  tirs  en  salve  de  bombes; 

-  etc. 

Il  faut  ajouter  3  cette  liste  les  fonctions  fondamentales  de  gestion  et  de 
surveillance  du  systftme  sur  lesquelles  nous  reviendrons  plus  en  detail  au 
paragraphe  suivant.  Elies  reprfisentent  un  volume  tr6o  important  de  mfemoires 
programme  (ce  sont  essentlellement  des  logiques) .  Elies  dependent  tr6s  fitroite- 
ment  des  missions  et  des  besoins  directs  des  equipages;  elles  ont  en  effet  pour 
but  d'assister  les  op6rateurs  humains  dans  1 'utilisation  d'ur.  systfime  complexe 
prfisentant  un  nembre  extrfemement  important  d'etats  que  l'homme  ne  peut  gfirer 
seul  en  temps  reel. 

Ces  fonctions  ne  peuvent  fitre  attribuees  3  priori  3  un  equipement  donn6.  En 
dehors  des  criteres  techniques  dfijS  evoques  on  fait  alors  intervenlr  d'autres 
critferes . 


35-5 


.  Lorsque  des  fonction3  dependent  6troitement  de  1 ' equipage  on  cherche  A  leg 
rassembler  dans  lea  calculateurs  centraux.  Ce  sont  en  effet  des  fonctions 
dont  la  raise  au  point  se  prolonge  beaucoup  et  qul  n'ont  pu  commence r  qu'aprSs 
1' integration  de  tous  les  Squipements  composant  la  systSme  (alors  que  la 
plupart  d'entre  eux  Ont  fait  l'objet  d'une  mise  au  point  individuelle 
pr Salable. ) 

.  Lorsqu'une  fonction  est  Stroitement  liSe  aux  caractSr lstiques  d'un  capteur 

on  peut  avoir  intfirdt  par  contre  a  falre  appel  A  la  meme  Squipe  de  spdcialiBtes 
et  3  localiser  les  calculs  dans  1‘dquipement.  Cela  a  ete  le  cas  sur  le  MI RAGS 
2000  pour  les  lois  de  navigation  air-air  qui  sont  dans  le  radar. 

.  Des  considerations  de  probability  de  rSussite  de  mission  sont  parfois  prio- 
ritaires.  Ainsi  certains  calculs  de  tir  canon  lids  a  des  fonctions  d'auto- 
dSfense  sont  localises  dans  le  gSndrateur  de  syraboles,  de  fagon  a  mettre  en 
oeuvre  le  minimum  d 'dquipements  nScessaires. 

De  la  mSme  maniSre  on  6vite  que  tous  les  modes  d'autodSfense  soient  dans  le 
mSme  calculateur  (Canon  et  Magic)  de  fagon  a  ce  que  de  simples  pannes  ne 
suppriment  pas  tous  les  modes,  sans  pour  autant  nficessiter  des  redondances 
completes. 

La  mlse  au  point  des  fonctions  citdes  presente  d'allleurs  en  general  une 
asaez  bonne  inddpendance  vis  a  vis  des  autres. 

En  resume  les  critSres  pris  en  compte  pour  la  repartition  3ont: 

la  facility  de  mise  au  point 
la  soup lease  devolution 

la  competence  particuliSre  de  certaines  §quipes  de  spdcialistes 
les  critSres  logistiques:  fiabilite  des  chaines,  maintenabilitd . . . 

I Is  doivent  dtre  pesds  cas  par  cas.  D'une  mani&re  gene rale  on  a  tendance  A 
regrouper  les  fonctions  dans  les  calculateurs  centraux  d'autant  plus  qu'elles 
sont  lntdardes,  c'est-A-dire  qu'elles  associent  un  plus  grand  nombre 
d’ informations  eiabor6es  dans  des  fonctions  autonomes,  et  qu’elles  sont  plus 
proches  des  procedures  opdrationnelles  et  de  1 ' utilisation  par  1' equipage. 

En  particulier  on  y  a  mis  toutes  les  fonctions  de  gestion  d'er.semble 
du  systSme  qui  ndcessitent  un  volume  important  de  logiques.  La  gestion 
du  digibus  avec  son  architecture  redondante ,  est  un  cas  particulier. 

II  est  certain  que  la  situation  est  le  rdsultat  des  possibilites  technologiques 
et  des  moyen3  de  d6ve loppement  de  logiciel  disponlbles. 

Elle  est  aussi  caractdristlque  d'une  certaine  "taille"  de  syst&me.  Sont 
actuellement  en  ddve loppement ,  en  parall&le  sur  le  MIRAGE  2000,  des  systfemes 
plus  simples  -  possddant  une  moins  grande  variety  d'armos  et  de  modes  et 
congus  autour  du  calculateur  de  J a  centrale  A  inertie  -  et  des  systSmes  plus 
complexes  -  possedant  une  architecture  hi6rarchis§e .  Tous  utilisent  cependant 
des  calculateurs  de  puissance  §quivalente  et  des  liaisons  de  type  digibus. 

4.  LES  PRINCIPALES  FONCTIONS  INTEGREES  ET  CENTRALIS EES 

Pour  illustrer  ce  qui  prCcAde  on  peut  revenir  avec  un  peu  plus  de  details  sur  ce  que 
l'on  a  appeie  la  gestion  d'ensemble  du  systeme  et  plus  particuliArement  sur  deux 
aspects : 

-  la  gestion  des  commandes ,  des  visualisations  et  des  modes 

-  la  surveillance  permanente  du  systSme  et  la  maintenance  integree 

Les  logiciels  correspondants  sont  reqroup§s  dans  les  calculateurs  centraux. 

a)  Gestion  des  commandes,  des  visualisations  et  des  modes 

Le  but  esc  d'appurLor  une  assistance  aussi  grande  que  possible  au  pilote.  On 
essaye  ainsi  que  le  pilote,  en  face  d'un  objectif  ou  d'un  but  de  mission  donn6, 
r.  'ait  si  possible  A  decider  que  de  l'arroe  3  utiliser  et  que  le  syst&me  se  charge 
des  autres  selections:  mise  en  oeuvre  des  6quipements  et  des  logiciels  et  mise 
en  oeuvre  des  visualisations.  On  lui  laisse  3eulemen\.  le  choix  des  options,  lors- 
qu'li  y  en  a.  L'acces  aux  modes  doit  Stre  d'autant  plus  simple  et  immediat 
qu'il  s'agit  de  situations  op6rationnelles  critiques  (telles  que  1 'autodefense) . 

Sur  la  iigne  infer ieure  du  poste  de  selection  d'armes  et  de  modes  appnralssent 
les  armes  effectivement  emportees  par  1' avion.  Lorsqu'on  s61ectionne  une  arme 
(par  defaut  on  est  en  mode  navigation)  apparalssent  sur  la  ligne  superieure  les 
seules  options  disponlbles  pour  cette  arme.  Encore  essaye-t-on  de  pr6seiectionner 
les  options  les  plus  probables  (que  le  pilote  n'a  A  modifier  qu ’eventuellement) . 


r 


35-6 


Ces  selections  entralusnt  autowatlquement  3*  rruse  en  oeuvre  des  fonctions  adequate s 
de  tout  les  Cquipementa  (fonctions  wat6-‘lelles  et  logicieH.es)  .  En  part  leu  ller  les 
reticules  et  symboles  qai  doiveut  apparaltre  sont  entlCrement  determines  (aux 
options  pt£?)  par  ces  operations. 

Une  commande  reste  cependant  prior ltalre  par  rapport  9  ce  poste  de  selection:  e'est 
un  bo>  *-on  a  troi3  positions  situe  sur  la  manette  des  gaz  et  accessible  &  tout 
instant.  Ces  trois  positions  correspondent  3: 

-  Magic 

-  Canon 

-  Poste  de  selection  des  armes  et  des  modes 

de  sorte  qu'on  peut  passer  instantanement  dans  un  mode  d'autodefenae.  (On  change 
d'arme  si  on  a  dCji  seiectionne  un  mode  air-sir)  . 

Certaines  commande s  "temps  r6el"  situdes  sur  la  manette  ou  le  manche,  sont  utili- 
sSes  dif ffiremment  pour  diffdrents  modes  ou  armes.  Par  example  il  existe  une 
commande  multiple  qui  pilote: 

-  en  air-air:  le  decrochage  radar  et  les  dlfferents  modes  d'accrochage  automatique 
(axe,  viseur,  vertical) 

-  en  air-sol:  designation  de  la  cible,  passage  en  navigation  -  passage  en  attaque 
(apr6s  preselection  d'un  mode  d'attaque  et  des  options  corresix>ndantes  sur  le 
poste  de  selection  on  peut  repasser  en  navigation  tout  en  memorisant  le  mode 
d'attaque!  . 

En  ce  qui  concerne  la  commande  des  visualisations;  on  peut  signaler  que  des  etudes 
de  c'.-arge  d’6change  ont  conduit  3  fixer  1*  interface  de  la  maniere  Buivante:  la 
gestion  centralisee  adresse  une  llste  des  reticules  a  presenter  8  chaque  instant. 
Par  ^-illeurs  le  generateur  des  symboles  regoit  (gendralement  a  cadence  beaucoup 
plus  e levee)  les  variables  pour  les  reticules  mobiles;  ces  variables  peuvent  etre 
solt  des  grandeurs  physiques  de  base  d’usage  general  (roulis,  tangage...),  solt 
directement  le  jeu  de  valeurs  necessaires  au  positionnement  en  axes  viseur 
(exemple:  un  reticule  de  visee  eiabord  par  la  conduite  de  tir  air-sol).  Alnsi 
le  generateur  de  symbole  possAde-t-il  une  blbHothftque  de  tous  les  reticules  ou 
symboles,  la  gestion  de  leur  presentation  effective  etant  faite  a  chaque  instant 
par  un  logiciel  integrB  et  centralise. 

II  faut  noter  egalement  que  cette  gestion  fait  intervenir  l’Ctat  r6el  dc  equipe- 
ments  et  de  leurs  fonctions.  II  faut  en  effet  ne  presenter  que  les  reticules 
valldes  et  utlllsables  par  le  pilote.  Cette  gestion  Be  fait  au  besoin  par  groupes 
de  reticules  qui  sont  necessaires  ensemble  pour  un  mGme  mode.  En  cas  de  panne 
on  presente  les  consignes  positives  sur  les  modes  qui  restent  disponlbles,  par 
exemple:  passez  en  hausse  manuelle,  reportez-vous  3  la  planche  de  bord  (Instruments 
secours) . 

Enfln  pour  chaque  mode  une  trame  dlfferente  peut  ctre  necessaire  pour  lee  Cchanges 
sur  le  digibus.  Le  choiv:  ce  cette  trame  fait  egalement  partle  de  la  gestion 
centralist . 


On  peut  en  resume  schematlser  la  gestion  centralisee  des  commandos,  des  visualisa¬ 
tions  et  des  modes,  comme  un  module  logique  dont  les  entrees  sont: 

-  l'etat  des  equipements 

-  les  commandes  actionn6es  par  le  pilote 

et  les  sorties  sont: 

-  la  liste  des  visualisations 

-  les  ordres  de  mise  en  oeuvre  des  fonctions  iuat6rislles  et  logicielles 
des  equipements 

-  1' affectation  des  commandes  en  fonction  du  mode,  la  gestion  des 
afflchages  sur  le  poste  de  selection. 

-  le  choix  de  la  trame  d'6change  diglbus. 


b)  Surveillance  du  systSme  et  maintenance  lntegree 


La  surveillance  permanente  du  systfime  est  bas6e  sur  l'utillsatlon  des  autotests 
permanents  (ou  cycllques)  qui  existent  dans  tous  les  equipements,  et  qui  ont  un 
taux  d'efficacite  61ev6  plus  particulierement  dans  les  equipements  numeriques. 


Le  but  premier  de  cette  surveillance  est  de  permettre  une  gestion  complete  des 
visualisations,  des  commandes  et  des  modes  pour  decharger  compietement  le  pilote 
de  ce  boucI  en  ne  lui  lalssant  3  disposition  que  lea  fonctions  reellement  opfira- 
tionnelles . 


35-7 


Le  resultat  de  ces  autotests  circule  cycliquement  sur  le  digibus.  II  est  utilise 
par  1* ensemble  des  Cquipements  pour  la  constitution  des  validitSs  de  certaines 
chalnes  et  par  la  gestion  centralist  contme  on  vient  de  le  voir. 

Une  re tomb 6 e  de  cette  surveillance  est  une  fonction  de  maintenance  int£gr6e. 

On  peut  en  effet  enregistrer  les  changements  d'etat  des  Squipements  pendant  le  vol 
dans  les  m6moires  non  volatiles  du  calculateur  central.  Ceci  est  utilise  au  retour 
du  vol  pour  declencher  les  operations  de  maintenance. 

5.  DIFFERENTES  ARCHITECTURES  CONCERNANT  LES  CALCULATEURS  CENTRAUX  ET  LE  DIGIBUS 

Les  choix  ayant  conduit  au  d6veloppement  des  differents  materiels  tels  que  calculateurs 
centraux  et  digibus  nous  permettent  d'avoir  &  notre  disposition  un  ensemble  de  modules 
s'adaptant  parfaitement  3  differentes  architectures  de  systSmes  en  fonction  des  besoins 
operationnels . 

Les  modules  de  base  principaux  sont  les  suivants: 

Coupleur  standard  de  bus  permettant  a  un  eauipement  d  ®tre  abonne  sur  un  digibus 
(simple  ou  redondant) 

-  Coupleur  procedure  permettant  de  gerer  un  digibus 

-  Coupleur  digibus  CD84  assurant  une  gestion  evoluee  d'un  digibus  (simple  ou  redondant) 
ainsi  que  le  mode  abonne 

-  Fichiers,  prises  et.  cables  specifiques  assurant  une  excellente  immuni  .6  aux  parasites 

-  Coupleurs  de  sous-bus  permettant  d'6tendre  la  procedure  de  base  du  digibus  et  en 
particulier  le  nombre  d'abonnes. 

-  Repeteur  de  bus  permettant  d 'interconnecter  deux  digibus  en  assurant  un  trSs  bon 
decouplage  eiectrique  entre  eux  (immunite  au  bruit). 

.La  premiere  architecture  (developpCe)  est  congue  autour  de  deux  digibus  redondants 
gCrCs  par  un  calculateur  en  mode  normal  et  par  un  deuxieme  calculateur  en  mode  secours 
en  cas  de  panne  du  premier. 

Le  deuxieme  digibus  n'est  utilise  qu'en  cas  de  panne  du  premier. 

.La  deuxieme  architecture  presentee  (d6velopp6e)  diffAre  de  la  premiere  par  le  fait  que 
chaque  6quipement  n'est  connects  qu'3  un  seul  digibus  sauf  en  ce  qui  concerne  les 
deux  calculateurs  gfirants. 

Les  deux  digibus  sont  gArAs  alternativement  par  le  calculateur  normal  ou  le  calculateur 
secours  en  cas  de  panne  du  premier  (debit  Aquivalint  IMbit/B) . 

II  y  a  par  ailleurs  un  digibus  evant  et  un  digibus  arriere  sur  I'avion  considere,  et 
ceci  pour  des  considerations  de  vuln6rabilite . 

.La  troisifime  architecture  (d6veloppee)  diffAre  de  la  deuxiSme  par  1 'utilisation  du 
calculateur  secours.  Celui-ci  n'est  plus  dormant  en  mode  normal  mais  travaille  en 
abonne  comme  extension  de  volume  memoire  et  de  puissance  de  calcul  du  premier  culcula- 
teur . 

En  cas  de  panne  du  calculateur  gArant  en  mode  normal,  le  calculateur  secours  gSre  les 
deux  digibus  alternativement. 

.La  quatriSme  architecture  (en  cours  de  developpement)  permet  d'obtenir  deux  digibus 
independants  travaillant  en  mAme  temps  et  non  plus  alternativement  (debit  equivalent 

2  Mbit/s) . 

Chaque  calculateur,  en  mode  normal,  gAre  un  bus  et  est  en  m6me  temps  abonne  sur  le 
deuxiSme;  les  transferts  d ' informations  ent_-e  les  deux  digibus  s'effectuent  par 
1 ' intermedia ire  de  coupleurs  COS  integres  dans  les  deux  calculateurs. 

En  cas  de  panne  d'un  des  deux  calculateurs,  le  calculateur  restant  peut  gerer  les 
deux  digifcus  alternativement  comme  dans  1' architecture  precedents, 

.L 'architecture  suivante  presentee  (systSme  en  projet)  permet  de  g6neraliser  le  digibus 

3  tous  les  points  d'emport  par  extension  de  la  capacite  d'abonnes. 


35-8 


Ce  syatfime  se  distingue  per: 

-  1 'utilisation  de  coupleurs  de  sous-bus, 

-  la  programmation  dea  entries: sorties  analog iuues  vers  lea  different*  points  d'emports, 

-  1* utilisation  d 'armaments  et  d'emports  sophistigufis  qui  indiquent  au  systems  qui  ils 
sent  et  &  quel  point  d'emport  ils  sont  accroch6s. 

1 'utilisation  d'une  mfimoire  de  masse  permettant  de  charger  les  programmes  n6cessaires 
suiVant  la  configuration  des  emports. 

-  des  pylones  avec  une  interface  digitale  et  analogique  standard  et  s'adaptant  a 
diff6rents  bus  (Digibus,  Arinc  429  et  MIL  STD  155 3-B)  pour  permettre  1' interoperabilite 
des  armes. 

.  La  deiniAre  architejture  presentee  (developpSe)  est  caract6ris6e  par  une  structure 
dlstribuee  et  non  plus  centralisSe  comme  les  pr6c€dentes. 

6.  METHOCOLOGIE  CE  DEVELOPPEMENT  DE  LOGICIEL 

D'une  maniSrj  g§n6rale,  la  m6thodologie  apparalt  comme  une  forme  d* organisation 
technique,  humalne  et  administrative  du  travail  a  rfialiser  permettant  de: 

-  D6fir.ir,  repartir  et  coordonner  les  activites  du  personnel. 

-  PrGvoir  et  contrfller  les  dfilais,  les  coflts  et  a  la  qualite  des  travaux. 

Les  programmes  assurant  les  differentes  fonctions  du  syetfeme  doivent  Stre  fiables  et 
faciles  a  modifier. 

Pour  atteindre  ces  objectify,  ]a  mCthodologie  doit  reposer  sur  les  principes  suivants: 

-  Collaboration  6troite  entre  1'avionneur  et  les  fabricants  du  logiciel 

-  Procedures  de  test  et  de  validation  trSs  pouss6es 

-  t  ’composition  des  tiches  par  Stapes  parfaitement  ddfinles 

-  Emploi  d* aides  automatis£es . 

Le  developpement  du  logiciel  comporte  trois  phases  principales  d^composfies  en  {-tapes : 
Definition  du  logiciel: 

.  specifications  fonctionnelles  globales 

.  specifications  detaill6es  des  fonctions  op6rationnelles; 

-  Realisation  du  logiciel: 

.  specifications  detainees  du  logiciel 
.  analyse  globale 
.  analyse  detailiee 
.  developpement  des  m6thodes  de  test 
.  programmation  et  mise  au  point 
.  test  d' ensemble 

-  Validation  du  Byst&me: 

.  essais  au  sol  statiques  et  dynamiques 
.  essais  en  vol 

.  La  phase  de  definition  du  loqiciel  est  placSe  sous  la  responsabilite  de 
1'avionneur,  avec  la  participation  6troite  des  utilisateurs,  des  Squipementiers  et 
des  fabricants  de  logiciel. 

Elle  consiste  A  etablir  deux  types  de  documents: 

-  Les  specifications  fonctionnelles  globales  du  systems  dans  lesquelles  les  fonctions 
sont  d6crites  d'un  point  de  vue  operationnel  sans  tenir  compte  du  dSct  )page  de 
celles-ci  entre  les  diff6rents  equipements. 

Les  specifications  detainees  des  fonctions  operationnelles  qui  specifient,  pour 
chaque  fonction  A  realiser,  le  d6coupage  du  logiciel  entre  les  diff6rents  equipe¬ 
ments  et  li g  tftches  a  effectuer  par  chacun  d'eux. 


35-9 


.  La  phase  de  realisation  du  logiciel  est  de  la  responsabilite  de 
chaque  fabricant  de  logiciel,  c'est-4-dire ,  en  general  de  cheque  fabricant  d'fiquipement. 

Les  trois  premiered  Stapes  prfici&ent  toutes  les  infomations  nScessaires  6  l'6criture 
et  4  la  mise  au  point  des  programmes;  organigrammes,  bilan  mftmoire,  charges  de  calcul, 
dScoupage  en  modules,  moyens  de  tests,... 

Les  deux  etapes  guivantes  se  d€roulent  simultandment; 

-  D4veloppement  des  moyens  de  tests 

-  Programmation  et  mise  au  point  en  statique  des  diffCrents  modules. 

La  derniere  etape  permet  de  s' assurer  que  les  programmes  sont  conformes  aux  specifica¬ 
tions.  Les  tests  sont  faits  en  dynamique  sur  une  baie  de  validation  de  logiciel  four- 
nissant  en  temps  reel  des  jeux  de  paramfetres  coh6rents  thdoriques. 

.  La  phase  de  validation  du  systeme  est  de  nouveau  de  la  responsabilite  de  l'avionneur. 
II  8'agit  de  tester  les  diffgrentes  fonctions  dans  un  environnement  opdrationnel  avec 
les  veritables  equipements  et  un  c4blage  conforme  4  celui  de  1’ avion. 

(In  banc  d' integration  est  utilise  dans  un  premier  temps  pour  des  essais  statiques 
(essais  d 1 equipements ,  verification  des  c4blages,  mesures  de  precision...)  puis  des 
essais  dynamiques  4  l'aide  d'un  ensemble  de  stimulation. 

Cet  ensemble  de  stimulation  permet  d'injecter  au  niveau  des  capteurs  du  systeme 
(centrale  4  imartie,  radar,  eentrale  adrodynamique , . . . )  un  jeu  de  parametres  cohdrents 
en  temps  reel  et  prealablement  enregistres  en  vol  ou  sur  des  simulateurs.  Ce  systeme 
permet  de  "rejouer*  un  vol  d'.  -isai,  une  phase  particuliere  de  la  mission,  autant  de 
fois  qu'on  le  desire  afin  de  iaire  des  mesures  ou  des  enregistrements  particuliers . 

C'est  au  cours  de  cette  phase  d'essais  au  sol  qu'un  certain  nombre  de  modifications 
4  apporter  au  logiciel  vont  apparaltre,  dues  4  des  erreurs  cV  programmation,  des 
changements  ou  des  precisions  4  apporter  aux  specifications. 

Enfin  la  phase  d'essais  en  vol  se  termine  par  1' acceptation  du  logiciel  qui  sera 
implantSe  dans  les  equipements  de  serie. 


Fig.  I  Mirage  2000  -  nav/attack  system 


c”  ■» w  l  displays 
HMD  now  J 


central  situation 


LARGS  MILO  Of  VIEW 

.  NAVIGATION 
.  CUNS  AND  ROCKITS 
.  LOW  AMO  HIGH  DRAC  SOM1S 
.  APPROACH 


PRESENTATION  or  ALL  NEEDED  INFORMATION 

IN  A  S1NCLE  PUCE 

rOR  EACH  PHASE  or  A  MISSION 


Fig.2  Displays 


"REAL  TIME"  CONTROLS  ON  THE  STICK 
AND  ON  THE  THROTTLE 


CENTRAL  MOOES  AMD  OPTIONS  SELECTION 


SOFTWARE  AIDED  SELECTION  : 

•  ONLY  POSI I  RLE  OPTIONS  AVATUSU  AT  ONK  TIPS 


Fig. 3  Controls 


uvre  d 


Decrochai*  iy| 

(py  Ttltotn*  Mr/^9l)TF,d8fe 


tfil 


F5: 


Fig.4  Mirage  2000  -  poignde  pilote 


Coaundt  alidad* 

•  acrrochaga  radar 


\  w 


Elections  araaa 

iltallianant 
Rada  rS  Magic 


wSit*  radar 


□  POL 


RRR 


□  □HD 


MRGI53DIRL  I 


mmfflmmm 


Fig. 6  Poste  de  commande  armement 


SNOIIVWHOiNI  V  }  \  -»«»<>:>  \  \uoutD  »u  I  V  uou„  an  I  Y  1  Y  »p  •xoi 


Fig.8  Architecture  du  logiciel 


AMD  HAVE  RECOMMENDED  AND  STUDIED  FOR  MANY  YEARS  THE  USE  OF  DIGITAL  TECHNIQUES. 


.  1961  :  IDEA  OF  USE  OF  A  DIGITAL  COMPUTER  FOR  NAV /ATTACK  COMPUTATIONS 
.  1967  :  SUGGEST  THE  USE  OF  MULTIPLEXED  DIGITAL  BUSES 
.  1968  ;  DEVELOPMENT  OF  A  MULTIPLEXED  DIGITAL  BUS  AND  STUDY  OF  PROTECTION 
AGAINST  INTERFERENCES. 

.  1970  :  STUDY  WITH  EMD  OF  GINA  DIGIBUS 

.  1973  :  DEVELOPMENT  OF  THE  FIRST  AIR  TO  GROUND  ATTACK  INTEGRATED  SYSTEM 
USING  AN  UNIVERSAL  DIGITAL  COMPUTER.  INU  AND  CRT  HUD 
.  1974  i  SUPER  ETENDARD  AIRCRAFT 
.  1974  :  DEVELOPMENT  WITH  EMD  OF  GINA  DIGIBUS 

,  1975  :  DIGIBUS  GINA  HAS  BEEN  SELECTED  FOR  SUPER  MIRAGE  AND  MIRAGE  2000 
INTEGRATION  TESTS 

•  :  D 1 G I  BUS  GINA  HAS  BEEN  OPERATED  ON  INTEGRATION  BENCHES  AND  FLIGHT  TESTS 

1980 


IT  IS  USED  ON  ALL  MODERN  AIRCRAFT  : 

IN  DEVELOPMENT  :  MIRAGE  2000 
ATLANTIC 
MIRAGE  FI 

IN  SERIAL  PRODUCTION  s  MIRAGE  FI 
IT  IS  STANDARDIZED  IN  THF  MILITARY  A/C,  SHIPS,  MISSILES 


Fig. 9  History  of  Gina  Digibus 


REDONDANT  BUS 
BACK-UP  MANAGEMENT  UNIT 


Fig.  10  Architecture! 


CM  :  SIANGAN)  count* 
Cf>  :  C0MMAM3  COLJ1U 
S  :  MCNflORtM  IMS 


snaicta  i  s rmiaa 


FRONT  AND  REAR  BUS  FOR  TACTICAL  PROBLEMS. 

TWO  MANAGEMENT  UNITS  :  MAIN  AND  BACK-UP  UNITS 
FLIP-TLCr  HAtiAuEnEriT  CT  T!!C  TWO  B'J? 

EQUIVALENT  :  1  Ma ns/s 


Fig.  11  Architecture  2 


BACK-UP  MANAGEMENT  UNIT  PERFORMS  ADDITIONAL  COMPUTATIONS  IN  NORMAL  MODE 


BACK-UP  MODE  :  (FAILURE  OF  ONE  MANAGEMENT  UNIT)  •  IDENTICAL  TO  NORMAL  MODE  BUT  WITH  FEWER  TASKS 


Fig,  1 2  Architecture  3 


TWO  SEPARATE  BUS  EACH  ONE  CONTROLLED  BY  A  COMPUTER  t  EQUIVALENT  2  totTt/s 
EACH  COMPUTER  CONTROLS  ITS  BUS  AND  IS  CONNECTED  TO  THE  OTHER  ONE 
BACK-UP  MODE  :  EACH  COMPUTER  CONTROLS  THE  TNO  BUSES  BY  PUP-FLOP 


35-17 


37- 


F/A-18A  TACTICAL  AIRBORNE  COMPUTATIONAL  SUBSYSTEM 


T.  V.  McTigue 
Branch  Chief 

McDonnell  Aircraft  Company 
McDonnell  Douglas  Corporation 
Post  Office  Box  516 
St.  Louis,  Missouri  63166  U.S.A 


ABSTRACT 


This  paper  presents  a  description  of  the  Tactical  Airborne  Computational  Subsystem  used  in  the  U.S. 
Navy /McDonnell  Douglas  F/A-18A  Hornet  Fighter/Attack  Weapon  System.  It  describes  an  airborne  processing 
system  of  physically  distributed  computer  resources  interconnected  through  a  multiplex  communication 
network.  Specifically,  the  paper  describes  the  design,  development,  and  integration  of  the  Tactical 
Airborne  Computational  Subsystem  for  the  U.S.  Navy/McDonnell  Douglas  F/A-18A  Hornet  Weapon  System.  The 
F/A-18A  Hornet  tactical  computer  subsystem  consists  of  two  central  Mission  Computers  and  a  number  of 
distributed  processors  embedded  in  various  sensor  and  display  subsystems.  This  distributed  processing 
system  is  interconnected  by  and  communicates  over  a  MIL-STD-1553A  serial  1  MHz  command/response 
multiplex  network.  The  distributed  processing  system  architecture  is  discussed  and  the  rationale  is 
presented  for  the  partitioning  of  the  computational  tasks  between  the  central  Mission  Computers  and  the 
distributed  processors  embedded  in  the  sensor  subsystems.  The  salient  features  of  the  central  Mission 
Computer  and  the  distributed  processors  are  discussed  along  with  a  description  of  the  functional  opera¬ 
tion  of  the  interconnecting  MIL-STD-1G53A  multiplex  conmuni cations  system.  Finally,  the  development 
process  for  the  Operational  Flight  Program  (OFP)  for  the  central  Mission  Computers  is  described  includ¬ 
ing  a  discussion  of  the  support  facilities  which  were  used  for  the  software  integration  and  validation. 

1.  INTRODUCTION 


The  purpose  of  ti;e  F/A--18A  Hornet  Weapon  System  is  to  deliver  air-to-air  and  air-to-ground  weapons 
on  targets  that  must  be  detected,  identified,  acquired,  tracked,  and  destroyed  by  the  pilot  using 
sophisticated  sensors  and  weapons.  In  the  course  of  an  F/A-1SA  Hornet  mission,  millions  of  split-second 
computations  and  decisions  must  be  made  within  the  aircraft.  The  pilot,  in  addition  to  flying  the 
aircraft,  must  constantly  monitor  the  instruments  and  interpret  the  readings  to  ensure  that  the  weapon 
system  can  accomplish  its  purpose.  One-man  operability  was  a  prime  goal  in  the  design  of  the  F/A-18A 
Hornet.  Every  decision  and  task  that  could  be  safely  removed  from  the  pilot  was  incorporated  in  a 
highly  integrated  computational  subsystem.  The  operations  within  tlie  subsystem  are  still  at  the  pilot's 
comnand,  but  he  is  able  to  perform  his  primary  tasks  with  confidence  based  on  reliable,  real-time 
operation  of  his  computational  subsystem.  This  subsystem  consists  of  two  mission  computers  and  a  number 
of  distributed  computers  in  various  sensor  and  display  equipments.  The  Operational  Flight  Program  (OFP) 
for  the  Mission  Computer  was  developed  by  McDonnell  Douglas  Corporation,  St.  Louis,  Missouri,  and  is 
being  fl ight -tested  and  qualified  by  McDonnell  Douglas  and  Navy  pilots  at  the  Naval  Air  Test  Center, 
Patuxent  River,  Maryland.  The  first  U.S.  Navy  squadron  was  activated  at  the  Naval  Air  Station  in 
Lemoore,  California,  in  February  1981,  and  the  first  production  aircraft  will  be  delivered  in  July  1981. 
The  F/A-18A  has  also  been  selected  by  Canada  and  is  under  serious  consideration  by  a  number  of  other 
U.S.  allies. 

2.  DATA  PROCESSING  SUBSYSTEM  (DPS) 

The  DPS  consists  of  two  Mission  Computers  (MC)  and  several  distributed  computers  in  various  sensor 
and  display  equipments.  The  Mission  Computers,  which  employ  the  U.S.  >  uvy  AN/AYK-14  standard  computer, 
integrate  the  overall  operation  of  the  avionics.  The  rationale  for  t.o  Mission  Computers  was  the  same 
as  for  two  engines.  When  they  both  are  operational,  they  provide  increased  weapon  system  performance. 
When  one  is  not  operational,  the  other  provides  enough  performance  for  self-defense  and  safe  return. 

The  airborne  computational  requirtuents  were  classified  into  two  major  categories  (Figure  (1)): 

o  Sensor-oriented  computations 

o  Mission-oriented  computations 

Sensor-oriented  computations  are  defined  to  be  those  independent  computations,  such  as  sensor  coor¬ 
dinate  transformations,  platform  management,  and  signal  processing,  which  are  peculiar  to  a  particular 
sensor  or  display.  Mission-oriented  computations,  such  as  weapons  launch  calculations,  are  defined  to 
be  those  computations  directly  related  to  performing  the  mission  and  dependent  on  the  Integration  of 
Information  from  several  avionics  subsystems.  Table  I  shows  typical  examples  of  the  two  categories  of 
computations. 


* 


TABLE  I  -  COMPUTATIONAL  CATEGORIES 


Sensor-Oriented 


Mi 5Si on-Oriented 


o  Air  Data  Calculations 
o  Radar  Signal  Processing 
o  Inertial  Platform  Management 
o  Display  Symbol  Generation 


o  Alr-to-Air  steering  and  launch 
zones  for  gun  and  missiles 
o  Air-to-Ground  steering  and  release 
for  bombs,  rockets,  gun,  and  missiles 
o  Selection  of  best  available  data 
from  various  sensors 
o  Integrated  display  management 


37-2 


A  system  design  technique  was  used  on  the  F/A-18A  Hornet  that  produced  a  set  of  Integration  Block 
Diagrams  which  were  used  to  partition  the  system  requirements  into  specific  tasks  for  each  subsystem 
onboard  the  aircraft.  The  total  airborne  computational  tasks  were  partitioned  Into  mltslon-orlented 
tasks  allocated  to  central  mission  computers  and  Into  sensor-oriented  tasks  performed  In  distributed 
processors  In  the  sensor  subsystems  (Figure  (2)).  This  relieved  the  central  computers  of  those  tasks 
which  could  be  more  effectively  performed  and  managed  in  distributed  and  Independent  sensor  processors. 
This  approach  offered  functional  modularity  of  the  sensors,  whereas  system  Integration  was  provided  by 
the  Mission  Computers.  Hence,  improved  sensors  and  displays  can  be  added  later  to  the  Avionics  System, 
and  present  ones  can  be  changed,  with  minimum  impact  on  other  equipment.  Likewise,  If  the  armament  Is 
altered  for  new  or  modified  weapons  as  the  mission  of  the  aircraft  Is  enlarged,  such  changes  can  be 
accommodated  primarily  through  changes  to  the  Mission  Computer  and  Stores  Management  Set  software. 

A  top-down  software  design  approach  Is  used,  which  partitions  each  computer  program  Into  software 
modules  of  manageable  size  based  on  functional  groupings  of  computational  tasks.  The  rationale  for 
modular  software  Is  analogous  to  that  for  modular  hardware.  First,  It  permits  each  module  to  be 
Independently  developed,  debugged,  and  tested  In  para1 lei  with  the  other  modules.  Second,  It  allows 
changes  to  occur  within  a  module  without  causing  changes  to  be  made  in  other  modules,  as  long  as  the 
external  modular  interface  remains  the  same.  Analogous  to  the  modularity  and  controlled  Interfaces  In 
the  hardware,  new  programming  modules  can  be  added  and  old  ones  deleted  without  Impacting  the  whole 
program  as  long  as  the  module  interfacing  rules  are  followed.  Documentation  and  understanding  of  the 
total  computer  program  Is  sinplifled,  because  each  module  can  be  described  and  understood  as  a  separate 
entity. 

2.1  Sensor-Oriented  Processing 

On-board  the  F/A-18A  Hornet  there  are  four  major  subsystem-embedded  reprogrammable  computers  and  a 
number  of  smaller  subsystems  with  embedded  microprocessors  with  Read-Only  Memories  (ROM).  Table  II 
summarizes  the  computer  hardware  for  the  major  subsystems  with  reprogrammable  computers.  Table  III 
presents  the  computer  hardware  information  for  the  subsystems  with  ROM  computers. 


TABLE  II  -  SENSOR-ORIENTED  REPROGRAMMABLE  COMPUTERS 


COMPUTER 

CPU 

SPEED 

MEMORY 

INERTIAL 

NAV  COMPUTER 

2901 

238  KOPS 

16K  CORE 

RADAR  DATA 
PROCESSOR 

2901 

700  KOPS 

250K  DISK/ 
16K  RAM 

RADAR  SIGNAL 
PROCESSOR 

54S181 

7100  KOPS 

250K  DISK/ 
48K  RAM 

STORES 

MANAGEMENT 

PROCESSOR 

8080 

200  KOPS 

32K  CORE 

TABLE  III  -  SENSOR-ORIENTED  ROM  COMPUTERS 


SUBSYSTEM 

CPU 

MEMORY 

AIR  DATA  COMPUTER 

2901 

5K  ROM 

COMM  SYSTEM  CONTROLLER 

8080 

16K  ROM 

FLIGHT  CONTROL 

COMPUTER  (4) 

MCP-701A 

44K  ROM 

FORWARD-LOOKING  INFRARED 

9900 

32K  ROM 

LASER  SPOT  TRACKER 

2901 

12K  ROM 

MAINTENANCE  MONITOR  PANEL 

8080 

IK  ROM 

MAINTENANCE  SIGNAL  DATA 
RECORDER  SET 

8080 

14K  ROM 

MULTIPURPOSE  DISPLAY  (2) 

2901 

5K  ROM 

Each  sensor  computer  performs  only  those  computations  necessary  to  perform  its  well-defined  task. 
This  includes  all  computations  required  to  translate  some  measured  physical  parameter,  such  as  air 
pressure,  into  useful  information  for  the  pilot,  such  as  altitude,  airspeed,  and  Mach  number.  Once  the 
information  is  computed.  It  is  sent  to  the  Mission  Computer  over  the  Avionics  Multiplex  (MUX)  bus. 
There  It  Is  used  with  information  from  other  sensors  to  perform  the  mission-oriented  computations  as 
well  as  for  display  to  the  pilot.  Figure  (3)  shows  the  major  sensor  computers  and  their  allocated 
software  computational  tasks. 

2.2  Mission-Oriented  Processing 

2.2.1  Mission  Computer  Hardware 

The  Mission  Computer  Subsystem  consists  of  two  identical  computers  built  by  Control  Data  Corpora¬ 
tion  (CDC).  They  are  the  new  U.S.  Navy  Standard  Airborne  Computers  designated  the  AN/AYK-14.  Although 
the  hardware  of  the  two  computers  is  Identical,  their  computer  programs  are  different  and  are  dedicated 
to  specific  processing  tasks.  The  AN/AYK-14  Is  a  high-speed,  general  purpose  digital  computer  speclfl* 
cally  designed  to  meet  the  real-time  requirements  of  airborne  weapon  systems.  The  computer  uses  four 
AMD  2901  four-bit  slice  Large  Scale  Integrated  (LSI)  circuits  to  Inclement  the  16-bit  Central  Processing 
Unit  (CPU).  The  CPU  is  ml cro-progranwed  by  means  of  ROM  firmware  to  emulate  the  Instruction  set  of  the 
U.S.  N*vy  Standard  Shipboard  Computer  designated  the  AN/UYK-20.  By  emulating  the  AN/UYK-20  Instruction 
set,  t.ie  AN/AYK-14  can  use  the  same  CMS-2M  Higher  Order  Language  (HOL)  support  software  originally 
designed  for  the  AN/UYK-20. 


37-3 


The  AN/AYK-14  consists  of  ten  plug-in  modules  and  a  single  plug-in  modular  power  supply  contained 
In  one  Weapon  Replaceable  Assembly  (WRA)  weighing  about  42  pounds  and  occupying  0.625  cubic  feet.  Each 
computer  contains  65,536  (16-bit)  words  of  7/13  mil  (Inside/outside  diameter)  core  memory  for  a  total  of 
more  than  a  million  individual  cores  per  computer.  The  memory  In  each  Mission  Computer  can  be  doubled 
from  64K  to  128K  within  the  present  equipment  envelope  simply  by  replacing  the  two  present  32K  memory 
modules  with  two  recently-developed  64K  modules.  Figure  (4)  shows  the  computer  and  some  of  Its  plug-in 
modules. 


The  salient  features  of  the  computer  are  presented  below: 


o  Type  and  organization 
o  Storage 

o  Instruction  execution  rate 
o  CPU 

o  Instruction  Set 
o  Serial  Input/Output 

o  Discrete  Input/Output 

o  Interrupts 
o  Clocks 

2.2.2  Mission  Computer  Software 


General-purpose,  stored  program,  parallel,  binary, 
fixed-point.  Integer,  two's  complement 

65,536  words,  16  blts/word  plus  2  blts/word  parity, 
nonvolatile,  random  access,  3D  magnetic  core, 

0.9  microsecond  cycle  time,  16  bit  addressable 

450,000  operations/sec  (depending  on  Instruction  mix) 

Four  AMD  ?901  4-bit  slice  LSIs 

ROM  firmware  emulation  of  AN/UYK-20  Instruction  set 

Three  Independent,  dual-bus  M1L-STD-1553A  multiplex 
channels  (serial,  16  data  bits  plus  1  bit  parity, 
transformer-coupled,  1  MHz,  50,000  words/sec/channel ) 

32  input  discretes 

32  bi-directional  input  or  output  discretes 
Eight  external,  22  internal 
Two  programmable  clocks 


Each  of  the  two  Mission  Computers  Is  dedicated  to  specific  processing  tasks  by  means  of  Its  stored 
program.  One  computer  Is  assigned  the  Navigation  (NAV)  and  Support  processing  tasks  and  associated 
display  management.  The  other  computer  is  assigned  the  Air-to-Air  and  Air-to-Ground  Weapon  Delivery 
processing  tasks  and  as$o".ated  display  management.  The  stored  program  in  each  compute*  Is  partitioned 
into  functional  software  Modules.  Each  software  module  is  assigned  to  an  engineer/programmer  develop¬ 
ment  team  that  follows  the  software  module  from  initial  concept  until  delivery  of  the  computer  program 
in  the  weapon  system.  Each  computer  has  a  small  backup  software  module  for  selected  functions  of  the 
othei  computer.  These  backup  modules  are  executed  only  In  the  event  the  primary  computer  for  these 
functions  should  fail  The  functional  software  modules  in  each  computer  are  shown  in  Figure  (5). 


2. 2.2.1  Executive  Module 

The  executive  program  module  Imposes  order  and  structure  on  the  entire  F/A-18A  operational  flight 
program.  All  functional  program  modules  are  processed  under  executive  control,  whi'-h  sequences  them  In 
an  appropriate  flow  and  calls  them  at  a  rate  consistent  with  their  requirements. 

Six  major  tasks  are  performed  by  the  executive  module.  First,  It  Initializes  the  MC  after  start-up 
or  after  restart  from  a  powei  Interruption.  Second,  it  schedules  the  order  and  rate  of  execution  of 
each  functional  module.  Third,  it  schedules  the  order  and  rate  of  Input/output  operations  for  each 
module.  Fourth,  it  controls  the  servicing  of  all  interrupts,  external  and  internal.  Fifth,  it  manages 
inter-computer  communication  between  the  Navigation  M0  and  the  Weapon  Delivery  MC.  Sixth,  it  uses  the 
scheduling  and  Input/output  management  functions  to  ensure  proper  sequencing  of  applications  processing. 

2. 2. 2. 2  Air-to-Air  Module 

The  air-to-air  module  performs  the  following  functions: 

1)  initializes  the  radar  air-to-air  search  pattern  based  on  the  weapon  selected 

2)  computes  aiming  reticle  for  director  or  disturbed  gun  mode 

3)  computes  aiming  reticle  for  director  or  manual  rocket  mode 

4)  computes  maximum  and  minimum  launch  ranges  and  steering  cues 

5)  computes  other  aircraft  and  target  parameters  for  display. 


37-4 


2. 2. 2. 3  Alr-to-Ground  Modulo 

The  air-to-ground  module  performs  the  following  functions: 

1)  designates  ground  targets  using  radar,  forward  looking  Infrared  (FUR),  laser  spot  tracker 
(LST),  or  visual  means 

2)  automatically  positions  sensors 

3)  calculates  ballistic  release  times 

4)  calculates  steering  cues  for  weapon  release  and  reattack 

5)  calculates  launch  envelope  data  for  air-to-ground  missiles  and  gun 

6)  Issues  release  pulses  for  correct  weapon  delivery  and  weapon  Intervals 

7)  manages  strike  camera  (SCAM)  for  damage  assessment. 

2. 2. 2. 4  Navigation  Module 

The  navigation  module  performs  the  following  functions: 

1)  selects/calculates  the  best  available  navigation  data 

2)  calculates  steering  to  prestored  waypoints 

3)  performs  velocity  and  position  updates 

4)  performs  target  marking 

5)  calculates  range,  bearing,  heading,  and  steering  error  to  selected  waypoint  and  TACAN  station. 

2. 2. 2. 5  Data  Link  Module 

The  data  link  module  decodes  and  processes  messages  received  from  a  shipboard,  airborne,  or 

ground-lased  terminal.  The  messages  contain  Information  used  In  the  following  functions: 

1)  waypoint  insertion 

2)  display  of  data  for  vectoring  to  airborne  targets  and  rendezvous  points 

3)  display  of  precision  course  direction  data  for  air-to-ground  weapon  delivery 

4)  display  of  automatic  carrier  landing  data 

5)  processing  of  couple  requests  to  the  flight  control  computers 

6)  processing  of  test  messages 

7)  processing  of  radar  target  data  and  aircraft  data  to  be  transmitted  in  the  data  link  reply 
messages. 

2. 2. 2. 6  Tactical  Controls  and  Displays  Module 

The  tactical  controls  and  displays  module  performs «the  following  functions: 

1)  manages  the  radar  control  panel/display  graphics  program.  Symbology  controlled  by  this  func¬ 
tion  Includes  targets,  target  status,  radar  status,  air-to-air  weapon  delivery  cues,  air-to- 
ground  and  navigation  cues,  armament  status,  aircraft  flight  status,  data  link  targets  and 
cues,  pushbutton  legends/status,  and  hands-on-throttle-and-stick  cues.  This  function  also 
controls  radar  and  display  modlng  so  the  above  calligraphic  symbology  can  be  superimposed  on 
radar  video  presentations. 

2)  manages  the  FUR  control  panel/display  graphics  program.  Symbology  controlled  by  this  function 
Includes  FUR  status,  aircraft  flight  status,  and  pushbutton  legends.  This  calligraphic 
symbology  is  superimposed  ur  FUR  video  presentations. 

3)  manages  the  LST/strlke  camera  control  panel/display  graphics  program.  Symbology  controlled  by 
this  function  Includes  LST/strlke  camera  status  and  pushbutton  legends. 


4)  manage^  the  air-to-ground  guided  weapons  control  panels/display  graphics  program.  Symbology 
for  the  high-speed  antiradiation  missile  (HARM),  Maverick,  and  Walleye  weapons  controlled  by 
this  function  Includes  targets,  weapon  status,  aircraft  flight  status  cues,  weapon  delivery 
cues,  and  pushbutton  legends.  This  calligraphic  symbology  Is  superimposed  on  weapon  videc 
presentations. 


S)  manages  the  stores  management  control  panel/display  graphics  program.  Symbology  controlled  by 
this  function  Includes  stores  status  for  each  station,  air-to-ground  weapon  delivery  program¬ 
ming,  and  pushbutton  legends. 


37-5 


2. 2. 2. 7  Support  Controls  and  Displays  Module 

The  support  controls  and  displays  module  performs  the  following  functions: 

1)  manages  the  cautions/advisories  display  graphics  program.  Symbology  controlled  by  this 
function  Includes  cautions  and  advisories  for  engines,  hydraulics,  electrical,  environmental 
control  system,  flight  control  set,  and  avionics  systems. 

2)  manages  the  built-in  test  (BIT)  display  graphics  program.  Symbology  controllod  by  this  func¬ 
tion  Includes  avionic  subsystem  status,  memory  Inspect  data,  maintenance  panels,  and  pushbutton 
legends. 

3)  manages  the  test  pattern  display  graphics  program.  Symbology  controlled  by  this  function 
Includes  test  pattern,  pushbutton  test  cues,  and  boreslght  cues. 

4)  manages  the  engine  display  graphics  program.  Symbology  controlled  by  this  function  Includes 
left  and  right  engine  status  and  pushbutton  legends. 

5)  manages  tne  checklist  display  graphics  program.  Symbology  controlled  by  this  function  Includes 
checklist  cues,  aircraft  data,  and  pushbutton  legends. 


pushbutton  legends. 


2. 2. 2. 8  Navigation  Controls  and  Displays  Module 


The  navigation  controls  and  displays  module  performs  the  following  functions: 

1)  manages  the  horizontal  situation  display  control  panel/dlsplay  graphics  program.  Symbology 
controlled  by  this  function  includes  comp,ass  and  associated  steering  cues,  aircraft  flight 
status,  TACAN/waypol nt  data,  alignment  data,  navigation  data,  update  cues,  and  pushbutton 
legends. 

2)  manages  the  attitude  director  Indicator  display  graphics  program.  Symbology  controlled  by  this 
function  Includes  aircraft  flight  status  cues,  such  as  attitude  and  turn  "ate. 

3)  manages  the  data  link  display  graphics  program.  Symbology  controlled  by  this  function  Includes 
command  data,  data  link  cues,  and  data  change  cues. 

4)  manages  the  up-front  control  panel  to  provide  data  entry/readout  and  mode  selection  capability 
for  autopilot,  navigation  data,  and  weapon  delivery  data. 

5)  computes  the  position  of  the  film  strip  for  moving  mao  and  navigation  functions  related  to  the 
horizontal  situation  display. 

2.2. 2. 9  Head-Up  Display  Nodule 

The  head-up  display  (IIUD)  module  manages  the  HUD  graphics  program.  Symbology  controlled  by  the  HUl) 
module  Includes  aircraft  flight  data,  data  link  cues,  navigation  cues,  radar  status,  armament  status, 
air-to-air  weapon  delivery  cues,  and  air-to-ground  weapon  delivery  cues. 

2.2.2.10  Inflight  Engine  Condition  Monitor  Module 

The  inflight  engine  condition  monitor  module  monitors  various  engine  and  -ssoclated  aircraft 
parameters  to  provide  engine  health  Information  to  the  pilot  and  maintenance  personnel.  Cautions, 
advisories,  and  real  time  engine  parameters  are  displayed  In  the  cockpit.  Life  usage  Indices  and  other 
engine  maintenance  Information  are  transmitted  to  the  Maintenance  Signal  Data  Recorder  (MSDR). 

2.2.2.11  Inflight  Monitoring  and  Recording  Module 

The  inflight  monitoring  and  recording  module  monitors  and  processes  various  aircraft  sensor  outputs 
for  control  and  display  of  most  of  the  pilot  cautions  and  advisories  and  transmits  avionic  and  non- 
avionic  equipment  failures  to  the  MSDR.  Control  Is  provided  for  the  data  recorder  and  provision  Is  made 
for  the  recording  of  tactical  data  in  fie  air-to-air  and  air-to-ground  aircraft  modes.  Also,  aircraft 
fatigue  levels  are  monitored  and  recorded  during  flight. 

2.2.2.12  Avionics  Built-In  Test  Module 

The  avionics  built-in  test  module  provides  the  control  by  which  an  operator  can  run  individual 
tests  on  each  of  the  Interfacing  subsystems.  It  also  evaluates  data  received  by  the  MC  from  each  of  the 
Interfacing  subsystems  as  to  their  operational  status.  This  data  Is  correlated  by  subsystem  and  current 
status  and  is  displayed  in  the  cockpit.  In  addition,  the  data  Is  converted  into  predefined  codes  each 
representative  of  a  specific  sailure  of  an  individual  subsystem  for  transmission  to  the  MSDR. 


Mission  Computer  Self-Test  Module 


37-6 


2.2.2.13 

The  mission  computer  self-test,  module  performs  the  following  functions: 

1)  Immediately  after  computer  turn-on,  tests  those  functions  vhlch,  when  tested,  Interfere  wltn 
normal  computer  operation; 

2)  periodically  tests  those  functions  of  the  MC  CPU  and  memory  which,  when  tested,  do  not 
Interfere  with  normal  computer  operation  as  well  as  performing  an  end-to-end  check  of  the 
capability  of  the  MC  to  communicate  with  each  peripheral; 

3)  maintains  error  information  for  later  maintenance  action;  and 

4)  latches  WRA  fault  Indicator  and  sets  WRA  status  signal  as  required. 

2.2.2.14  Mission  Computer  Backup  Modules 

A  backup  module  Is  resident  in  each  computer.  Each  tackup  module  performs  essential  software  func¬ 
tions  of  the  other  mission  computer  when  a  failure  occurs  In  that  computer. 

2.2.2.15  Mathematical  Subroutines  Module 

The  mathematical  subroutines  module  supports  other  program  modules  by  providing  common  mathematical 
routines  such  as  trigonometric,  logarithmic,  and  matrix  operations. 

2.3  Avionics  Multiplex  System 

Digital  data  between  the  Mission  Computers  and  the  peripheral  avionics  components  is  transfi rred  on 
the  MC-controlled  Avionics  Multiplex  System.  The  system  consists  of  thrre  multiplex  channels,  as  shown 
In  Figure  (6).  Each  channel  consists  of  two  redundant  1  MHz  MIL-STD-1553A  buses,  with  only  one  bus  of 
each  channel  active  at  any  given  time. 

2.3.1  Physical  Characteristics 

Each  bus  Is  operated  In  a  half-duplex  fashion  (two-way  transmission,  but  not  simultaneously)  using 
self-clocking  Manchester  encoding  and  word-serial,  bit-serial,  time-division  format.  All  peripheral 
units  on  a  single  channel  are  connected  to  the  transmission  lines  comprising  that  channel  In  parallel, 
party-line  fashion,  such  that  physical  removal  of  a  unit  from  the  lines  does  not  Interrupt  the  continu¬ 
ity  of  the  lines.  All  units  on  the  same  channel  see  all  Of  the  data  on  that  pair  of  buses.  However,  on 
a  given  channel,  data  Is  transferred  only  between  the  MC  and  a  single  peripheral  at  a  time.  Each  bus  Is 
Independently  routed  through  the  aircraft  to  ensure  reliable  comnunlcatlon  In  the  event  of  damage. 

A  multiplex  terminal  Is  incorporated  as  an  integral  part  of  each  equipment  Interfaced  with  the  MC. 
Each  terminal  performs  the  necessary  functions  to  recede  a..'  validate  data  from  the  MC  and  transmit 
data  to  the  MC.  In  addition,  it  provides  the  necessary  conversion/reformatting  of  data  to  Interface 
with  the  equipment  component  logic  and  accepts/generates  the  control  signals  that  coordinate  the 
transfers  within  the  equipment.  No  peripheral  Is  required  to  receive  or  transmit  over  more  than  one  bus 
at  a  time.  Data  words  transmitted  by  the  MC  or  by  tne  peripheral  are  always  transmitted  over  the  same 
bus  that  carried  the  command  word. 

All  transmissions  are  formatted  Into  standard  messages.  The  MC  Initiates  each  message  transmission 
by  a  command  word  that  Identifies  the  message,  the  number  of  data  words,  and  the  peripheral  Involved. 
Each  word  transferred  contains  a  three-bit  sync  waveform,  16  Information/control  bits,  and  one  parity 
bit. 


2.3.2  Functional  Characteristics 


Only  one  of  the  buses  of  each  redundant  pair  Is  active  at  any  one  time.  The  MC  selects  which  of 
the  data  buses  is  to  be  used  for  data  transmission  and  Initiates  each  data  exchange  over  the  selected 
bus. 


If  the  MC  detects  an  Input/output  (I/O)  error,  l.e. ,  no  response  from  a  peripheral,  a  parity  error, 
or  a  data  dropout,  it  will  terminate  processing  of  the  I/O  message.  The  MC  then  re- Interrogates  the 
peripheral  by  re-transmitting  the  same  command  word  on  the  other  bus.  If  another  error  occurs  on  the 
second  bus,  the  MC  Internally  flags  an  Invalid  response  condition  and  proceeds  to  the  next  transmission 
scheduled  for  the  peripherals  on  that  channel. 

Message  transfer  rates  of  EO,  10,  5,  and  1  Hz  are  provided  to  match  the  execution  rates  of  the 
modules  which  generate  or  use  the  data.  On-demand  transfers,  such  as  weapon  release,  also  3re  utilized. 
The  Mission  Computers  have  an  Independent  I/O  processor  for  the  multiplex  channels  permitting  full  use 
of  the  computer  CPU  for  processing  tasks  during  I/O.  However,  transfers  are  controlled  by  the  CPU  to 
ensure  that  inputs,  processing,  and  outputs  occur  sequentially  for  each  rate.  Control  of  the  multiplex 
system  Is  transferred  between  the  two  MCs  based  on  priority  of  need. 


37-7 


3.  MISSION  COMPUTER  SOFTWARE  DEVELOPMENT 


The  F/A-18A  Hornet  software  development  process  was  based  on  testing  the  flight  program  before, 
during,  and  after  the  actual  coding  of  the  program.  Figure  (7)  shows  the  five  major  phases  of  the 
software  development. 

o  Phase  1  consisted  of  creating  FORTRAN  inoaels  of  selected  equations,  algorithms,  ami  mode  con- 
trol.  These  models  were  tested  In  the  Software  Development  Facility  to  provide  the  analytical 
validation  of  the  equations  am‘  algorithms  to  be  used  In  the  OfP. 

n  In  Phase  2,  a  FORTRAN  model  of  the  baseline  design  was  used  at  the  McDonnell  Douglas  Cockpit 
Simulator  Facility  to  evaluate  the  Interface  with  the  pilot  and  to  test  the  mechanization 
proposed  for  the  weapon  system.  This  step  provided  vital  confirmation  of  design  adequacy  at  an 
early  stage  and  allowed  alternate  approaches  to  be  studied. 

o  In  Phase  3,  the  mathflows  were  coded  In  the  CMS-2M  language  system  and  the  program  compiled  In 
the  Software  Development  Facility.  The  object  programs  were  tested  in  the  Software  Test 
Facility.  At  this  point,  MC  hardware/software  Inconsistencies  were  Isolated  and  corrected, 
leading  to  preliminary  confirmation  of  correct  OFP  software  and  MC  equipment  Integration.  The 
McDonnell  Douglas  Software  Test  Facility  was  used  to  monitor  and  control  the  Integration, 

o  As  other  avionics  equipment  arrived.  Phase  4  tested  the  Mission  Computer  and  associated  software 
with  actual  Interfacing  equipment.  This  step  provided  Integration  of  the  MC  and  the  MC  OFP  with 
the  individual  avionics  equipment,  followed  by  integration  with  groups  of  related  equipment. 

o  Phase  5  then  reintroduced  the  man-1 n-the-loop  to  verify  the  total  man/machine  system.  This 
phase  used  the  McDonnell  Douglas  cockpit  simulator  with  the  OFP  running  In  the  Mission  Computers 
along  with  the  actual  flight  hardware  for  the  controls  and  displays.  The  chief  test  pilot  flew 
the  first  flight  profile  at  the  cockpit  simulator  prior  to  aircraft  first  flight.  The  MC  OFP 
was  then  thoroughly  evaluated  during  subsequent  flight  tests  which  are  the  final  measure  of  Its 
performance. 

4.  F/A-18A  SOFTWARE  DEVELOPMENT  FACILITIES 

The  F/A-18A  Hornet  Integrated  software  development  process,  discussed  above,  made  use  of  three 
separate  software  facilities: 

o  Software  Development  Facility 
o  Software  Test  Facility 
o  Cockpit  Simulator  Facility 

4 . 1  Software  Development  Facility  (SDF) 

The  Software  Development  Facility  is  a  modest-size  data  processing  facility.  It  uses  an  IBM 
System/370  commercial  computer  system  and  standard  peripheral  equipment,  operating  system,  and  language 
processors  (see  Figure  8)).  This  facility  is  used  for  all  FORTRAN  processing,  database  processing,  and 
compil at  lon/assembly  of  airborne  MC  programs. 

Figure  (9)  is  a  block  diagram  of  the  Software  Development  Facility  showing  the  IBM  S/370  mainframe 
and  associated  peripherals.  The  facility  Includes  the  following  equipment: 

o  (1)  IBM  370/138  Computer  (512k  memory) 
o  (4)  100  megabyte  disk  drives 
o  (21  magnetic  tapes  drives 
o  (1)  printer 
o  (lj  card  reader 
o  (5)  CRT/keyboard  terminals 

The  following  software  Is  currently  active  In  the  facility: 


System  Software 


Avionics  Support  Software 


o  VM/CMS  Operating  System 
o  IBM  System/370  Assembler 
o  FORTRAN  H  Compiler 
o  FORTRAN  H  Library 
o  SORT  Utility 
o  SCRIPT  Word  Processor 
o  Display  Editor 
o  Plotter  and  Tablet  Support 
o  Graphics  Attachment  Support 
o  Basic  System  Extension 


o  CMS-2M  HOL  Compiler/Assembler 
o  MACR0-20/14  Assembler 
o  CMS-2M  SYSGEN 

o  AN/AYK-14  Functional  Simulator  (SIM-14) 
o  Avionics  Environment,  Sensor,  and  Display  Models 
o  Database  Catalog  Program 
o  Operational  Flight  Program  Tape  Generator 
o  PATHFIND  Program 
o  Display  Compiler 

o  Software  Configuration  Control  Program 


4.2  Software  Tost  Facility  (STF) 

The  Mission  Computer  Software  Test  Facility  Is  minicomputer-controlled,  real-time  simulation  and 
test  facility  used  to  test  the  airborne  Operation-0  Flight  Program  (OFP)  In  the  MC  and  to  Integrate  the 
MC  and  Its  OFP  with  the  other  avionics  with  which  they  Interface.  The  STF  accomplishes  this  by  simu¬ 
lating  the  Inputs  to  the  MC  and  sending  them  out  over  the  Avionics  MUX  In  response  to  the  MC  requests 
for  data  from  various  aircraft  sensors.  The  MC  processes  these  Inputs  as  though  It  were  flying  In  an 
aircraft  and  then  Issues  output  data  to  the  simulated  sensors  and  to  the  cockpit  displays.  In  general, 
the  Input  sensors  are  all  modeled  In  software  tn  the  minicomputer  whereas  the  CRTs  used  to  display  the 
MC  outputs  are  the  actual  displays  used  In  the  cockpit.  This  provides  a  realistic  Input  signal  environ¬ 
ment  for  the  MC  and  a  realistic  display  of  MC  outputs  for  test  and  evaluation  by  the  engineers  and 
programmers.  Figure  (10)  Is  a  block  diagram  of  the  STF. 

4.2.1  STF  Hardware 

The  hardware  In  the  McDonnell  Douglas  STF  Is  divided  Into  three  major  benches  plus  tho  host 
computer  system  with  Its  nerlpherals  as  Identified  below: 

Host  Computer  Group  (See  Figure  (11)) 

(1)  HARRIS/7  Minicomputer 

(2)  Disk  Storage  Modules 
(1)  Magnetic  Tape  Unit 
(lj  Card  Reader 

(1)  Line  Printer 

(4)  CRTs  (System  Console  at.d  General  User) 

(1)  VERSATEC  Printer/Plotter 

General  Purpose  Interface  Bench  (See  Figure  (12)) 

(1)  High-Speed  CRT  and  Keyboard 

(2)  Simulation  Control  Panels 

(1)  Analog/Dlscrete  Interface  Unit 
(1)  Aircraft  Stick/Throttle 
(1)  Radar  Interface  Simulator 

Avionics  Integration  Bench  (See  Figure  (13)) 

(1)  Communications  System  Control  (CSC) 

(1)  Stores  Management  Sot  (SMS) 

(1)  Maintenance  Signal  Data  Recorder  (MSDR) 

1  Head-Up  Dlspl  ly  (HUD) 

(1)  Multipurpose  Display  Group  (MDG)  (three  cockpit.  CRTs) 

(lj  Up-Front  Control 

MC  Integration  Bench  (See  Figure  (13)) 

(2)  Mission  Computers 

(1)  Multiplex  and  Discrete  Interface 
(lj  Control  Keyboard 
(lj  Magnetic  Tape  Drive 
(2 j  Lab  CRT  Displays 

(1)  MuX  Monitor  and  Dci  Iplieral  Simulator 
(lj  Interface  for  MC  Support  Channel 

4.2.2  STF  Software 

The  environment  and  avionics  simulation  software  provides  realistic  real-time  inputs  for  the 
Mission  Computers  by  simulating  the  aircraft  environment  (e.g. ,  Atmosphere,  Equations  of  Motion)  and  the 
aircraft  avionics  subsystem  (e.g..  Radar,  Air  Data  Computer,  Inertial  Navigation  Set).  There  are  four 
major  divisions  of  these  functions: 

Scheduler 

Aircraft  Environment  Modules 

Atmosphere 
Autotrim/ Autopilot 
Inertia,  Forces,  Moments 
Equations  of  Motion 
T  .  get 

Aerodynamics 


37-9 


Aircraft  Avionics  Subsystems  Modules 


Radar 

Data  Link  (0/L) 

Inertial  Navigation  Set  (INS) 

Air  Data  Computer  (ADC) 

Laser  Spot  Tracker  (LCT) 

Forward  Looking  Infrared  (FUR) 

Flight  Control  Computer  (FCC) 

Maintenance  Signal  Data  Recorder  (MSDR) 

(Includes  engine  Interface) 

Stores  Management  Set  (SMS) 

Built-In  Test  (BIT) 

Communication  System  Controller  (CSC) 

(Includes  Interfaces  for  Instrument  Landing  System  and  TACAN) 


Support  Functions 


Environment,  Sensor,  and  Display  Model  Subroutines 
Input /Output  Conversion  Subroutines 
Aerodynamic  Library  Subroutines 
Multipurpose  Display  Group  Subroutines 


Cockpit  Simulator  Facility 


The  Cockpit  Simulator  Facility,  Figure  (14),  Is  a  laboratory  conplex  oriented  primarily  to  manned, 
real-time  flight  simulation.  It  Includes  a  CDC  Cyber  175  computer,  four  crew  stations,  terrain  maps, 
horizon  and  target  displays,  and  associated  hardware.  Each  crew  station  Includes  complete  flight  con¬ 
trols  and  Instruments  and  Is  located  In  a  forty-foot  fiberglass  dome.  Target  and  terrain  Imagery  Is  pro¬ 
jected  on  the  dome  and  presented  In  the  cockpit  on  software-driven  displays  or  actual  flight  display 
equipment.  Both  visual  and  sensor  (electro-optical.  Infrared,  radar)  Imagery  is  supported.  The  facil¬ 
ity  Is  used  for  weapon  system  design,  pilot  training,  tactics  development,  and  effectiveness  assessment. 


5.  SUMMARY 


In  summary,  the  F/A-18A  Tactical  Airborne  Computational  Subsystem  Is  a  distributed  computer  system. 

The  mission-oriented  computations  are  performed  In  two  central  Mission  Computers  and  the  sensor-oriented 

computations  are  performed  In  distributed  processors  In  the  sensor  and  display  equipment.  The  memory  In 
each  Mission  Computer  can  be  doubled  from  64K  to  128K  words  within  the  present  equipment  envelope  for  a 
total  combined  capability  of  256K  words.  This  memory  growth  along  with  the  flexibility  of  the  MC  multi¬ 
plex  Input/output  system  and  the  distributed  partitioning  of  the  sensor  and  display  computations  makes 
the  F/A-18A  computational  subsystem  easily  adaptable  to  changes  and  expansions  In  F/A-18A  mission 
requirements  and  ready  to  share  a  long  and  successful  future  with  the  F/A-18A  aircraft. 

References 

1.  Griffith,  V.  V.,  Kelfer,  L.  F.,  Paxhla,  E.  C. ,  et  al.,  "Aircraft  Avionics  Trade-Off  Study  (AATOS) ," 
McDonnell  Aircraft  Co.,  St.  Louis,  Mo.,  ASD/XR  73-20  Final  Report,  Nov.  1973. 

2.  Flnka,  H.  G.  and  Rosenkoetter,  E.,  “Aircraft  Avionics  from  the  Aircraft  Manufacturer's  Point  of 

View,"  McDonnell  Aircraft  Co.,  St.  Louis,  Mo.,  MCAIR  73-023,  Sept.  1973. 

3.  McTIgue,  T.  V.,  "F-15  Computational  Subsystem,"  AIAA  JOURNAL  OF  AIRCRAFT,  Vol.  13,  No.  12,  Dec. 

1976,  Pp.  945-947. 


37-10 


FIGURE  2 

F/A-18A  MISSION  vs  SENSOR  COMPUTATIONS 


AIR  COMBAT  SIMULATOR 
(FORTRAN  OFF) 


CARO 

READER 


LINE 

PRINTER 


MAG  TAPE 
(8/1000  BPI, 
75  IPS/8T) 


MAG  TAPE 
(8/1600  BPI. 
75  IPS/9T) 


/ 

QP0)-4M#  1M 


FIGURE  12 

FIA-18A  SOFTWARE  TEST  FACILITY 
GENERAL  PURPOSE  INTERFACE  BENCH 


FIGURE  13 

FfA-ISA  SOFTWARE  TEST  FACILITY 
INTEGRATION  BENCHES 


FIGURE  14 

F/A-1SA  COCKPIT  SIMULATOR  FACILITY 


38-1 


F/A-18  WEAPONS  SYSTEM  SUPPORT  FACILITIES 


Thomas  F.  O’Neill 
Navel  Weapons  Center 
F-1S  Facility  Branch  (Code  3114) 
China  Lake,  CA  93555,  U.S.A. 


SUMMARY 

The  U.S.  Navy  is  currently  acceptance-testing  the  McDonnell  Douglas  F/A-18  aircraft  Since  the  F/A-18  is  so  much  more 
complex  than  any  aircraft  currently  deployed,  more  sophisticated  support  tools  will  be  required.  The  main  support  tool 
will  be  a  weapons  system  support  facility.  This  facility  will  have  all  of  the  hardware  and  software  necessary  to  test,  modify, 
and  validate  all  of  the  avionics  hardware,  software,  and  firmware.  A  distributed  processing  approach  is  used  in  the  facility, 
which  contains  several  minicomputers  and  super  minicomputers. 

1.  INTRODUCTION 

The  McDonnell  Douglas  F/A-18  aircraft  is  an  all-we.thr'  fighter/attack  aircraft  capable  of  hosting  a  wide  range  of  ordnance. 
Figure  l.l  shows  the  F/A-18  with  some  of  the  ordnance  it  can  deliver.  Upon  its  acceptance  into  the  Fleet,  the  Navy  will 
assume  responsibility  for  the  modification,  test,  and  certification  of  the  avionics,  software,  and  weapon  systems  in  the  air¬ 
craft.  The  Naval  Weapons  Center  (NWC),  China  Lake,  Calif.,  has  the  responsibility  to  provide  system  engineering,  system 
integration,  software  development,  configuration  management,  and  test  and  evaluation  support  throughout  the  life  cycle  of 
the  F/A-18  aircraft. 

In  late  1978,  the  Weapons  System  Support  Activity  was  formed  at  NWC  to  provide  system  engineering  support  to  the 
aircraft. 

A  specialized  avionics  support  facility  has  been  tasked  by  the  Weapons  System  Support  Activity  to  provide  weapon  system 
support  during  all  phases  of  the  weapon  system  life  cycle. 


FIGURE  1.1.  The  F/A-18  and  some  of  its  ordnance. 

2.  SUPPORTING  THE  F/A-18 

With  the  addition  of  the  F/A-18,  NWC  becomes  the  proving  ground  for  an  increasing  majority  of  the  fleet  of  fighter/ 
attack  aircraft  within  the  Navy. 

NWC  has  overcome  a  number  of  problems  normally  associated  with  verification  and  validation  with  the  inception  of  a 
unique  weapons  system  support  facility.  The  long-term  success  of  this  support  facility  approach  has  been  demonstrated 
in  a  variety  of  other  programs,  including  those  for  the  A-7,  A-6,  A-4,  and  AV-8B  aircraft. 

Validation  through  the  use  of  test  flights  is  impractical  because  of  time  and  cost  factors.  By  providing  a  wort:  station 
in  which  the  various  subsystems  of  the  aircraft  can  be  modified  and  rigorously  tested,  NWC  has  succeeded  in  reducing 
cost  and  manpower  requirements  dramatically. 


38-2 


2.1  The  Weaponi  System  Support  Facility 

During  the  (ifi  iyUe  of  the  F/A-18,  many  changes  will  have  to  be  made  to  the  avionics  software.  These  changes  may 
be  made  in  response  to  problems  detected  or  new  capabilities  deund  by  the  Fleet,  or  may  occur  with  the  addition  of 
new  equipment  to  the  aircraft. 

The  Weapons  System  Support  Facility  (WSSF)  will  contain  the  avionics  processors,  commercially  available  computers,  and 
the  hardware  and  software  necessary  to  test  the  operational  flight  programs.  The  facility  then  becomes  the  main  tool  for 
validation.  Any  change  to  the  avionics  equipment  will  be  tested  in  the  facility  before  its  incorporation  into  the  aircraft. 

The  facility  will  provide  two  broad  categories  of  software -simulation  and  support.  The  simulation  is  a  high  order  language 
program  package  that,  given  the  same  inputs,  will  produce  an  output  identical  to  that  of  the  avionics  subsystems.  The 
facility  user  can  then  work  with  a  mixture  of  simulated  and  real  avionics.  The  support  software  conaiits  of  programs  that 
allow  the  engineer  to  generate,  test,  and  validate  new  load  modules  for  the  avionics  computers. 

Since  the  avionics  computers  are  not  designed  for  software  development,  the  facility’s  computers  must  provide  the  ability 
to  modify  the  source  code  for  the  flight  programs,  compile  or  assemble  them,  and  then  form  a  load  module  that  the 
avionics  computers  can  utilize.  The  tools  necessary  to  implement  this  process  include  cross  compilers  and  crott  assemblers 
to  translate  the  source  code  into  an  object  file,  and  some  form  of  linker/loader  to  create  a  load  module  from  the  individual 
object  code  files. 

2.2  The  F/A-18  Aircraft 

The  F/A-18  is  a  sophisticated,  high-perfonnancc  aircraft  that  is,  in  itself,  a  distributed  processing  system.  There  are  approxi¬ 
mately  30  computers  with  a  total  of  700,000  words  of  program  storage  in  the  aircraft.  These  computers  range  in  sfc e 
from  microprocessors  with  their  programs  in  readonly  memory  to  general-purpose  computers  with  more  than  256K  words 
random-access  memory  and  disk  memory.  Each  subsystem  in  the  aircraft  (for  example,  the  inertial  navigation  system,  stores 
management  set,  and  the  radar)  is  a  separate,  self-contained  computer  that  uses  a  dual  redundant  MIL-STD  1SS3  bus  to 
communicate  with  the  two  AYK-14  mission  computers. 

The  AYK-14a  act  as  bus  controllers  while  the  other  computers  in  the  aircrau  respond  to  the  commands  from  the  AYK-14 
as  remote  terminals.  That  is,  the  AYK-14s  act  as  “traffic  directors*’;  all  data  on  the  1553  either  comes  from  or  goes  to 
the  AYK-14s. 

The  F/A-18  cockpit  is  designed  to  give  the  pilot  a  visual  display  of  all  information  concerning  the  operation  of  the  aircraft. 
A  control  panel  could  not  be  provided  for  each  subsystem  because  of  the  prohibitive  number  of  subsystems  present  within 
the  aircraft.  Therefore,  the  solution  was  to  place  three  cathode  ray  tubes  in  the  cockpit,  and  drive  these  displays  by  two 
microprocessor  generatora.  These  displays  are  surrounded  by  20  pushbutton  switches,  which  are  used  to  select  the  infor¬ 
mation  to  be  displayed,  change  the  modes  of  the  avionics  computen,  and  select  the  weapons  to  be  dropped.  Typical  dis¬ 
plays  are  shown  in  Figure  2.1. 

W/A-n  MONNKT  COCKPIT 


amiATSON  DMPtAV 

FIG' IRE  2.1.  The  F/A-18  cockpit 


2.3.  Functional  Requirements 

The  following  are  functional  requirements  to  which  the  Weapons  System  Support  Facility  must  respond: 


t 


38  3 

(a)  The  facility  mutt  provide  the  ability  to  interchange  simulated  mode's  of  the  avionics  subsystems  with  the  real 
avionics  hardware.  This  must  be  accomplished  In  such  a  manner  that  the  real  avionics  hardware  cannot  detect  that  the 
rest  of  the  aircraft,  as  well  as  the  world,  it  being  simulated. 

(b)  The  1SS3  bus  traffic  has  to  be  realistic.  This  requires  that  the  hardware  interface  between  the  facility  computers 

and  the  13S3  has  to  respond  in  the  prescribed  manner,  and  that  the  data  that  the  simulation  generates  must  be  in  the 

correct  format. 

(c)  The  software  written  for  the  iimulation  mutt  be  in  torr.e  high  order  language  (HOL).  The  facility  is  expected  to 

support  the  F/A-18  program  well  into  u.e  1990s.  Over  the  life  of  the  laboratory,  if  the  software  it  not  readable  and  easily 

modified,  the  cost  would  become  astronomical.  This  requirement  has  a  major  impact  on  the  design  of  the  laboratory 

becauae  the  avionics  software  it  mostly  coded  in  assembly  language.  The  amount  of  memory  and  time  it  would  take  for 
an  HOL  simulation  of  assembly  language  programs  of  this  size  becomes  a  major  concern. 

(d)  in  order  to  host  the  cross  assemblers,  cross  compilers,  and  the  linker/loaders  necessary  to  develop  software  for  the 
avionics  computers,  a  large  address  space  is  essential  hi  the  facility's  computers.  These  programs  have  been  developed  by  the 
contractors  who  produce  the  avionics  subsystems,  and  are  currently  hosted  on  IBM  mainframes. 

(e)  Line  printers,  magnetic  tape  drives,  large  disk  storage,  and  graphic  devices  all  must  be  provided  in  the  facility  to 
store  and  process  the  data  collected  from  the  simulation  and  from  flights. 

3.  A  DISTRIBUTED  PROCESSING  APPROACH 

Analysis  of  the  above  tequirements  revealed  two  bake  approaches  to  the  WSSF  design. 

The  mainframe  approach  requires  the  purchase  of  a  single  computer  to  host  the  simulation  and  all  associated  tools. 

The  distributed  minicomputer  design  involves  the  use  of  a  number  of  minicomputers  tied  together  in  «  distributed  network 

scheme. 

3.1.  Advantages  of  the  Distributed  Approach 

Information  gathered  from  other  facilities  revea'cd  that  the  distributed  approach  has  several  advantages  over  the  single 
mainframe. 

There  are  typically  three  types  of  users  who  need  access  to  the  simulation  computers: 

The  simulation  programmer  requires  computer  time  to  code,  test,  debug,  and  integrate  his  software  with  the  other 
mode's  in  the  simulation. 

The  hardware  engineer  needs  access  to  the  computer  to  interface  rnd  test  new  equipment. 

The  simulation  user  uses  the  computers  to  run  the  programmer's  simulation  using  the  engineer's  hardware. 

In  the  past,  limiting  users  to  a  single  computer  quickly  resulted  in  scheduling  problems.  Multiple  computers  stive  this 
problem  by  dedicating  an  individual  computer  to  each  particular  area  of  need. 

The  distributed  processing  approach  also  has  the  advantage  of  easily  accommodating  the  addition  of  more  minicomputers 

to  meet  future  needs.  When  dealing  with  a  project  the  size  and  complexity  of  the  F/A-18,  it  is  impossible  to  atcuretely 

access  future  needs,  particularly  in  the  area  of  computer  memory  requirements.  The  use  of  minicomputeis  provides  an 
unlimited  exparsion  capability. 

3.2.  Selection  of  a  Family  of  Computers 

The  Digital  Equipment  Corporation's  (DEC)  PDP-11  line  of  computers  r.as  been  chosen  as  the  architectural  base  of  the 
facility  because  it  offers  a  broad  range  of  computers  that  can  meet  the  general  fj  ’port  and  real-time  requirements  in  the 
facility.  In  addition,  DEC  supplies  a  networking  scheme  which  is  extremely  app’icsble  to  the  facility’s  distributed  processing 
approach. 

3.3.  Testing  Tools 

A  necessary  function  of  the  facility  is  to  provide  a  means  of  testing  the  avionics  software.  Int  primary  tool  to  provide 
this  capability  will  be  a  simulated  environment  that,  uartg  a  combinatior  of  software  models  and  real  avionics  subsystems, 
appears  to  the  avionics  computers  C3  an  F/A-18  aircraft  in  flight.  The  simulated  models  have  to  provide  all  of  the  netessary 
outputs  in  the  correct  format  to  stimulate  the  other  models  and  any  real  avionics;  if  the  real  avionics  needs  sensory  inputs, 
the  simulation  computers  will  have  to  provide  these  stimulations. 

In  addition,  the  facility  provides  a  macro-level  emulation  of  the  avionics  computers  in  the  form  of  a  software  pa-Jcage 
that,  uuing  the  load  module  of  the  avionics  computers  as  input,  emulates  the  actions  of  the  avionics  computera  one  n  acio- 
code  instruction  it  a  time.  This  package  also  has  the  ability  to  set  breakpoints  in  the  execution  to  allow  the  operator  to 
examine  the  data  within  the  program  and  trace  the  path  of  the  program  through  the  load  module. 

When  the  simulation  is  running,  data  will  be  pa^  from  one  model  to  another  within  the  simulation  computera  and  rom 
one  avionics  computer  to  another  on  the  1553  bus.  The  facility  computers  will  provide  real-time  monitoring  of  aelccted 
sublets  of  all  of  this  data. 


38-4 


3.4.  Evaluation  Tools 

Software  packages  are  provided  to  .  .low  the  system  engineers  to  evaluate  the  simulation  and  avionics  software.  Specifically: 

(a)  Any  or  all  of  the  data  that  is  passed  among  computers  on  the  1 553  bus  can  be  recorded  on  magnetic  media  for 
later  data  analysis. 

(b)  By  recording  the  control  inputs  to  the  simulation  and  passing  these  back  through  the  simulation  at  a  Irter  date, 
the  simulation  can  be  forced  through  the  same  maneuvers  time  afler  time.  Using  thii  method,  differences  between  two 
versions  of  the  avionics  software  can  be  detected. 

(c)  The  1SS3  data  can  be  recorded  while  the  aircraft  is  in  flight  and  used  at  a  later  date  to  drive  the  simulator 
cockpit  displays. 

(d)  The  facility  will  provide  a  wide  range  of  data-reduction  software,  from  line  printer  listings  to  user-interactive, 
plotting  packages.  These  data-reduction  techniques  allow  rigorous  scrutiny  of  date  collected  during  flight  for  the  purpose  of 
isolating  errors  and  specific  problem  aress. 

3.5.  General  Support  Capability 

Because  of  the  large  address  space  required  fcy  certain  software  packages,  such  as  cross  assemblers  and  cross  compilers,  a 
PDF  VAX  11/780  was  purchased.  The  VAX  is  a  32-bit,  virtual  memory  c  mputer  with  disks,  magnetic  tape  drives,  and  line 
printers  needed  to  support  the  -vionics  software  generation  tools.  A  backend  graphics  processor  has  been  added  to  faciHate 
data  reduction.  This  relieves  the  VAX  of  the  heavy  processor  load  normally  associated  with  graphics  software.  Figure  3.1 
is  a  schematic  layout  of  the  VAXs  used  in  the  facility. 

HOST  CAPABILITY 


WSSF/ASDS 


WSSF/VCMS 


FIGURE  3.1.  PDF  VAX  11/780  schematics 


Real-Time  Stations 


In  the  area  of  general  support,  the  only  requirement  u  that  the  work  be  completed.  There  are,  however,  real-time  aspects 
of  the  facility  wherby  the  work  must  be  completed  within  a  certain  time  frame.  These  real-time  requirements  indicated 
that  three  types  of  work  stations  vould  be  required-the  integration,  validation,  and  special  function  stations. 

3.6.1.  The  Integration  Work  Station 

Error  correction  or  the  addition  of  hardware  requites  that  a  change  be  made  io  the  avionics  programs.  The  initial  testing 
of  any  modification  it  made  in  the  integration  station.  If  a  change  mutt  be  made  to  the  avionics  software  or  hardware,  the 
engineer  will  devise  a  solution,  be  it  a  simple  software  fix  or  something  as  complicated  as  a  new  weapon  system.  The 
integration  station  will  be  used  to  test  this  modification  until  the  performance  meets  with  the  engvieer’t  approval. 

3.6.2  The  Validation  Work  Stat*  ' 

Once  the  software  or  hardware  has  passed  the  initial  testing  in  the  integration  station,  it  is  transferred  to  the  validation 
station,  lust  as  the  name  implies,  validation  involves  rigorous  testing  for  the  purpose  of  ensuring  that  the  modification  made 
has  corrected  the  known  errors  without  introducing  any  additional  ones. 


38-5 


3.6.3.  Special,  Dedicated  Work  Stations 

Dedicated  work  stations  are  necessary  to  intensely  test  individual  subsystems  of  the  aircrnft.  In  these  stations,  it  is  not 

the  entire  weapon  delivery  system  that  is  being  tested,  but  lather  the  software  in  only  one  subsystem.  Currently  planned 

are  work  stations  for  the  stores  management  set  and  the  radar. 

4.  THE  FACILITY  AT  NWC 

The  PDP- 11/60  computer  has  been  chosen  as  the  basic  building  block  of  the  facility.  The  li/60  is  a  16-bit  minicomputer 

with  a  maximum  of  128K  (K  is  equal  1024)  words  of  memory.  Using  the  RSX-11M  operating  system  allows  both  soft¬ 

ware  development  »r.d  real-time  responsiveness.  The  design  of  the  integration  station  is  shown  in  Figure  4.1. 


4.1. 


AM  M 
AAA 


The  facility  compute*  must  communicate  with  the  avionics  subsystems  for  the  real-time  work  stations  to  function  properly; 
this  requires  an  interface  between  the  facility  computers  and  the  1553  bus.  This  interface  is  known  at  a  multiple  remote 
terminal  because  it  's  capable  of  responding  on  the  bus  as  if  it  were  several  avionics  subsystems.  The  multiple  remote 
terminal  is  a  micrt.-ontroller-based,  direct-memory-access  device.  Once  this  device  is  enabled,  it  handles  all  of  the  bos  traffic 
with  no  load  on  the  .acility  computers. 

There  are  two  types  of  remote  termiuds  in  the  facility-one  that  responds  to  commands  on  the  1553  as  the  tv,  3  micro¬ 
processors  for  the  display  group  and  one  that  responds  as  the  rest  of  the  avionics  subsystems.  The  display  group  remote 
terminal  responds  to  commands  from  the  AYK-!4s  to  rutomaticaily  build  a  display  file  in  PDP  memory,  perform  checksum 
calculations  to  verify  the  data  transfers,  and  transmit  status  information  concerning  the  display  group. 

The  other  remote  vc  .ninai  is  capable  of  acting  as  either  the  bus  controller  (for  testing  purposes)  or  as  16  remote  terminals. 
It  is  through  these  simulated  remote  terminal  ports  that  the  simulation  communicates  with  the  AYK-14s. 

4  2.  The  Real-Time  Computers 

Central  to  the  facility  is  a  PDP-11/60  used  exclusively  for  software  development  and  simulation  control.  Tracking  the 
development  of  the  various  models  would  be  unnecessarily  complicated  if  each  programmer  were  to  write  code  on  a 
different  computer.  Therefore,  software  development  is  limited  to  this  single  “HOST”  11/60.  Baseline  versions  of  the  models 
are  kept  in  one  set  of  accounts,  anti  the  development  versions  of  the  models  in  another. 

Because  the  simulation  runs  in  multiple  computers,  a  simulation  executive  has  been  written  to  run  on  the  “HOST”  11/60. 
When  the  user  wants  to  mn  the  simulation,  he  starts  this  executive,  which  in  turn  requests  the  following  information:  the 
nvne  of  the  file  containing  a  list  of  the  models  the  user  wants  to  run,  whether  the  models  are  baseline  or  development, 
and  which  computer  the  model  is  to  run  in.  The  executive  then  starts  the  models  in  the  appropriate  computers.  At  this 
time,  the  user  can  release  the  simulation  to  run  at  50  milliseconds,  single  step  the  simulation  (this  mode  is  useful  for 
debugging),  or  he  can  request  the  time  ii  takes  for  each  model  to  run.  The  executive  will  calculate  the  total  run  time  for 
the  models  the  user  has  chosen  and  will  disallow  the  configuration  if  this  total  exceeds  the  50-millisecond  maximum. 

It  was  estimated  that  five  ll/60s  are  needed  to  run  the  simulation  of  the  avionics  equipment  including  the  AYK-14s  and 
the  world  (modeb  of  the  earth  and  s'mosphere  ire  needed  for  proper  simulation  of  the  sirframe  characteristics).  This 
estimate  is  based  on  the  number  of  models,  the  cycle  time  of  the  simulation,  and  the  fact  that  certain  models  nwd  to  run 
in  the  same  computer. 


38-6 

A  sixth  computer  drives  the  display  units.  Since  the  display  group  has  a  very  powerful  instruction  set,  one  11/60  is 
dedicated  to  processing  the  display  data  received  on  the  1553  bus  from  the  AYK-14S. 

Yet  another  11/60  is  dedicated  to  hardware  development.  The  hardware  engineer  can  develop  his  interfaces  on  this  machine, 
and  if  the  device  ever  crashes  the  computer,  he  car.  simply  reboot  without  affectig  other  users. 

4.3.  Multiport  Memory 

Because  the  simulation  tasks  are  not  all  located  within  the  same  computer,  there  has  to  be  a  means  of  communicatlrv  data 
from  one  '"sdel  in  one  computer  to  another  model  in  a  different  computer.  Conventional  communication  schemes  were  not 
appealing  since  they  were  relatively  slow  (on  the  order  of  milliseconds  per  transfer).  To  solve  this  problem,  a  multiport 
memory  has  been  designed  and  fabricated. 

This  memory  uni*,  is  8 1 92  words  of  random-access  memory  which  contains  eight  computer  ports.  Each  port  can  interface 
with  a  different  computer,  thereby  allowing  up  to  eight  computers  to  communicate  with  each  other  at  memory  access 
speeds  (on  the  order  of  microseconds,  a  factor  of  1000  be*  -.r  than  other  nmunieation  schemes).  This  memory  is  fast 
enough  so  that  if  all  eight  ports  request  data  simultaneously,  all  requests  will  be  granted  within  the  normal  memory  access 
speed  of  the  1 1/60  memory  system.  There  is  a  self-contained  arbitrator  in  the  common  memory  unit  to  resolve  multiple, 
simultaneous  data  requests. 

Error  logging  and  diagnostic  ports  have  been  built  In  *n  facilitate  debugging  of  both  hardware  and  software  errors.  These 
ports  allow  errors  to  be  collected  for  detection  of  error-causing  condi  ;ons  within  the  memory  system. 

4.4  The  Delays 

The  simulation  of  the  two  display  microprocessors  is  complicated  by  the  addition  of  an  out-the-window  display  behind  the 
head-up  display.  This  out-the-window  view  has  been  added  to  increase  the  realism  of  the  work  station.  Figure  4.2  shows 
the  three  displays  and  the  out-the-window  view. 


FIGURE  4.2.  The  integration  cockpit. 

The  test  flichts  tha.  ire  currently  planned  for  the  F/A-18  when  it  arrives  at  NWC  will  cover  several  thousand  square  mile*. 
To  be  able  to  display  all  of  the  areas  the  pilot  will  be  looking  at  during  these  flights,  software  has  been  written  to  digitte 
the  background  d  'a  for  the  western  half  of  the  United  States.  Data  is  constantly  being  added  to  the  background  to 
increase  the  detail. 

The.  display  of  the  graphic  data  (the  background  and  the  simulation  of  the  avionics  displays)  is  divided  between  two 
graphical  processors.  The  background  is  displayed  by  an  Evans  &  Sutherland  Picture  System  11.  This  is  a  highly  capable 
graphics  system  that  displ^s  data  in  three  dimensions,  with  the  processor  handling  the  time-consuming  work  involved  in  the 
generation  of  three-dimensional  data. 

The  simulation  of  the  avionics  displays  is  performed  on  an  ADAGE  4145,  which  is  a  two-dimensional  device  having  the 
features  of  a  user-programmable  writable  control  store  (WCS).  The  WCS  can  then  be  programmed  so  that  the  ADAGE 
performs  as  though  it  were  two  avionics  display  generators. 


38-7 


4.5.  .lie  Static  Control  Panel 

Figure  4.3  shows  another  integral  part  of  the  cockpit  work  station,  the  static  control  panel.  This  "static  panel"  is  a  series 
of  control  switches  and  light-emitting  diode  displays  which  allow  operator  control  over  certain  parameters  affecting  the  air¬ 
craft.  By  dialing  in  the  roll,  pitch,  and  heading  if  the  aircraft,  the  user  can  effectively  fly  the  simulator  from  this  panel. 
The  simulation  is  constantly  monitoring  the  pam  I  to  see  if  the  operator  has  selected  control  of  any  of  the  more  than 
30  parameters  available.  If  a  particular  variable  is  not  selected  for  operator  control,  the  simulation  will  calculate  it;  if  on 
the  other  hard,  the  operator  does  want  control,  then  the  simulation  receives  the  value  for  the  parametet  from  the  panel 
inputs.  In  tiiis  way.  the  aircraft  can  be  frozen  in  space  and  any  of  the  parameters  can  be  varied  in  small,  controlled  steps. 


FIGURE  4.3  The  static  panel. 


4.6.  Data  Logging 

Hardware  and  software  will  have  to  be  built  and  written  to  interface  with  the  1553  bus  and  collect  all  or  any  part  of  the 
data  that  is  being  passed  from  one  computer  to  another.  The  data  will  be  -ollected,  time  tagged,  and  then  recorded  directly 
onto  magnetic  media  for  later  analysis. 

5.  CONCLUSION  AND  SUMMARY 

The  Naval  Weapons  Center  has  a  long  and  successful  history  of  supporting  aircraft  using  the  Weapons  System  Support 
Facility  approach.  Facilities  to  support  the  newer  avionics  systems  are  required  to  be  more  complex,  cost  effective,  and 
support  a  project  from  initial  verification  through  Fleet  maintenance,  lly  providing  a  WSSF  for  the  F/A-18  aircraft,  NWC 
will  succeed  in  reducing  long-term  cost  and  manpower  requirements  dramatically. 


imf’iNCE  NO.  Of  I  PER:  VI I -32 
DISCUSSOR’S  NAME:  Alan  Stern,  Boeing  Co. 
AUTHOR'S  NAME:  K.  Motes 


DISCUSSIONS 
SESSION  VI 1 


S7-I 


COMMENT:  In  the  evont  of  an  In-flight  reconfiguration,  how  Is  new  software  loaded  and  how  Is  It 
assured  to  be  co»vect7 

AUTHOR'S  RE PL":  In-flight  reprogramming  of  software  (as  distinct  from  reconfiguration  of  LRU's)  Is  not 
contemplated. 


REFERENCE  NO.  OF  PAPER:  VI 1-32 
DISCUSSOR'S  NAME:  O'.  A.  A.  Callaway,  RAE 
AUTHOR'S  NAME:  «.  hoses 

COMMENT:  One  of  the  rumors  we  hear  coming  out  of  USA  Is  the  requirement  by  the  USAF  that  the  MIL  STD 
1750A  Instruction  Set  Architecture  be  used  for  all  future  embedded  computer  applications.  Is  the  SIFT 
concept  compatible  with  the  1750A  ISA? 

AUTHOR'S  REPLY:  The  SIFT  concept  Is  compatible  with  the  1750A  ISA.  The  problem  might  be  to  find  the 
processor  that  has  the  175QA  ISA  and  the  required  speed  for  flight  control  computers.  We,  In  the  US, 
hear  the  same  rumor,  and  hope  thaTTt's  not  true. 


REFERENCE  NO.  OF  PAPER:  VI 1-33 
DISCUSSOR'S  NAME:  G.  Hall.  Sweden 
AUTHOR'S  NAME:  Nelson 

COMMENT:  Have  I  understood  It  right  that  this  can  be  used  only  on  machine-language  level? 

AUTHOR'S  REPLY:  Yes,  for  those  functions  that  Involve  Instruction  by  Instruction  analysis.  However, 
many  of  the  functions  are  more  global  In  nature.  SOVAC  could  use  Information  from  a  compiler  symbol 
table  and  loader  map  to  allow  most  functions  tc  ie  used  with  high-level  languages. 

In  fact,  If  the  SOVAf  software  had  full  access  to  the  data  available  from  a  compatible  high-level 
compiler.  It  could  perform  all  corresponding  SOVAC  functions  for  the  high-level  language  user. 


REFERENCE  NO.  OF  PAPER:  VI 1-33 
DISCUSSOR'S  NAME:  Richard  Schwartz,  SRI,  USA 
AUTHOR'S  NAME:  H.  Nelson 

COWENT:  In  what  language  Is  your  application  software  written?  As  you  mc.e  to  higher  level 
languages,  do  you  expect  SOVAC  to  still  be  useful?  Would  you  attempt  to  support  design  aids  taking 
advantage  of  the  use  of  a  higher  level  lanquage? 

AUTHOR'S  REPLY:  (1)  The  SOVAC  software  Is  written  In  PASCAL.  (2)  SOVAC  has  the  ability  to  recognize 
and  collect  data  on  all  observable  activity  In  the  tactical  computer.  Thus,  It  is  primarily  an  Issue  of 
SOVAC  software  development  to  give  It  high-level  language  support  capability.  (3)  In  the  future,  we 
plan  to  develop  SOVACs  for  machines  using  high-level  languages.  As  we  develop  the  requirements,  we 
will  Investigate  reasonable  high-level  aids.  I  expect  It  will  be  able  to  fully  support  the  needs  of 
the  high-level  language  user. 


REFERENCE  NO.  OF  PAPER:  VI 1-33 

DISCUSSOR'S  NAME:  M.  Mansell,  British  Aerospace 

AUTHOR'S  NAME:  Harvey  G.  Nelson 

COWENT:  On  a  number  of  occasions  during  flight  development  trials  on  Sec  Harrier  we  experienced  core 
store  corruptions  In  the  main  computer.  Have  you  experienced  similar  problems  at  the  NWC  and  could  you 
describe  whether  and  how  the  SOVAC  system  could  be  used  to  Investigate  such  problems? 

AUTHOR'S  REPLY:  We  have  had  similar  types  of  problems  at  NWC.  SOVAC  was  developed  to  be  *le  to 
assist  In  the  Isolation  of  the  source  of  these  faults.  To  Identify  the  memory  locations  changed,  use 
the  verify  function  to  compare  all  protected  locations  against  the  corresponding  values  In  a  reference 


V 


S7-2 


file.  Once  the  locations  have  beer.  Identified  SOVAC  can  be  used  to  watch  activity  associated  with 
those  locations.  Any  of  the  SOVAC  functions  can  be  used  to  collect  the  data  desired  to  analyze  the 
cause  of  the  problem. 


REFERENCE  NO.  OF  PAPER:  VI 1-33 

DISCUSSOR'S  NAME:  W.  R.  Richards,  Smith  Industries 

AUTHOR'S  NAME:  H.  Nelson 

COMMENT;  I  am  familiar  with  the  facilities  provided  by  Universal  microprocessor  Development 
Systems— In  particular  the  Tektronix  8002  system.  The  facilities  provided  by  "SOVAC"  seem  almost 
Identical.  Would  the  speaker  please  comment? 

AUTHOR'S  REPLY:  Microprocessor  development  systems  address  the  same  general  programmer  needs. 

However,  they  are  Intended  to  be  used  with  specific  microprocessors  with  complete  access  to  all  needed 
signal  and  data  lines.  Also,  their  capability  Is  somewhat  'ow  In  bandwidth. 

SOVAC  Is  designed  to  work  with  high  ccr’ci ty,  high  speed  "mini"  computers  with  less  than  full 
access  to  the  desired  signal  and  data  lines.  f0VAC  has  a  very  high  bandwidth  and  high  capacity  data 
collection  capability. 

Finally,  SOVAC  has  the  full  data  collection,  storage,  and  computational  capabilities  of  the  POP 
11/34  computer  to  support  It.  Most  enhancements  to  SOVAC  Involve  only  changes  to  the  SOVAC  software  In 
the  POP  11/34. 


REFERENCE  NO.  OF  PAPER:  VI 1-34 
DISCUSSOR'S  NAME:  Jim  McCuen,  Hughes,  USA 
AUTHOR’S  NAME:  G.  Wllcock 

COMMENT:  Has  the  UK  as  yet  developed  a  MIL-STD  for  Solid  State  Power  Controllers  (SSPC)?  *c  there 
any  SSPC  In  production? 

AUTHOR ' j  REPLY:  The  UK  has  not  yet  produced  a  document  similar  in  scope  to  a  MIL  standard  "or  Solid 
State  Power  Controllers,  although  standardization  Is  being  pursued  In  a  number  of  different  ways. 
Specifications  EL2141  and  EL2143  have  been  published  (source  Elec  2/4,  MOD (PE ) ) .  These  do  not  have  the 
authority  of  a  MIL  standard  but  are  Intended  to  promote  Information  Interchange  between  prospective 
users  and  suppliers  and  subsequently  to  serve  as  a  basis  for  particular  equipment  specifications.  The 
UK  MOO  Is  participating  In  tiie  activity  of  the  International  Standards  Organization  which  Is  drafting  a 
standard  for  Remote  Power  Controllers  ( IS0/TC20/SC1).  British  Defence  Standard  00-18  (Part  4)/lssue  1 
relates  to  discreet  (on/off) signal  Ing  but  Includes  the  operation  of  controllers  for  load  ratings  up  to 
0.2A.  A  more  comprehensive  coverage  of  loads  and  ratings  Is  being  considered  for  future  Issues. 

There  are  no  Defence  Standard  SSPCs  In  production,  although  there  are  a  number  of  different 
devices  In  an  advanced  state  of  development.  UK  MOO  has  funded  development  of  both  ac  and  dc  solid 
state  and  hybrid  units  at  Plessey,  Tltchfleld.  Development  of  a  monolithic  1C  to  perform  the  control 
functions  of  an  SSPC  with  Increased  reliability  and  reduced  volume  Is  also  being  funded  (Swindon 
Silicon  Systems,  Swindon).  Initial  devices  are  being  evaluated. 


REFERENCE  NO.  OF  PAPER:  VI 1-35 
DISCUSSOR'S  NAME:  Schoelch,  IA8G 
AUTHOR'S  NAME:  S.  Croce 

COGENT;  Can  you  give  some  figures  on  the  size  of  the  software  programs,  especially  of  the  main 
computer?  Do  you  have  any  experience  In  software  maintenance  of  this  system?  Which  people  and  how 
many  people  are  Involved  In  this  business? 

AUTHOR'S  REPLY:  (1)  Les  programmes  du  calculateur  principal  occupent  entre  40  et  50K  mots  pour  une 
mftnolre  Installed  de  64K  mots  de  16  bits  (plus  2  bits  de  parite).  (2)  Le  systfcme  du  mirage  2000  est 
actuellement  en  cours  de  ddveloppenent.  On  procSde  done  b  des  modlflc.itlons  plutOt  qu'S  de  la 
maintenance.  Celle-cl  sera  cependait  effectude  par  les  fabrlcants  des  matdrlels,  gul  sont  aussl  les 
"fabricants"  du  loglclel  incorpordL  Cette  maintenance  sera  toutfols  faclllt/e  par  1 'util Isatlon  d'un 
langage  de  haut  niveau  (ltr)  et  la  mlse  en  pratique  d’unu  methodologie  rlgoureua  qul  oblige  les 
programmeurs  i  rdallser  la  documentation  en  m£rne  temps  que  le  codage. 

(1)  The  programs  of  the  main  processor  occupy  from  40  to  50K  words  for  an  Installed  memory  of  64K  words 
of  16  bits  (plus  2  parity  bits).  (2)  The  system  In  the  Mirage  2000  Is  presently  In  the  process  of 
development  the  next  (step)  Is  to  proceed  to  (a  phase  of)  modifications  rather  than  to  maintenance. 

The  latter  (maintenance),  moreover,  will  be  accomplished  by  the  equipment  manufacturers,  who  are  also 
the  "manufacturers"  of  the  embedded  software.  The  maintenance  will  be  ever  facilitated  by  the  use  of  a 
high  order  language  (HOD  and  the  application  of  a  rigorous  methodology  which  forces  the  programmers  to 
do  the  documenting  at  the  same  time  as  the  coding. 


s 


REFERENCE  NO.  OF  PAPER:  VII -3S 

DISCUSSOR'S  MAME:  M.  Mansell,  British  Aerospace,  Klngstron  Division 
AUTHOR'S  NAME:  B.  Vandecasteele 

COMMENT:  On  your  dynamic  development  rig  do  you  Inject  "dynamic"  computer  generated  synthetic  signals 
representing  c  target  and  if  you  do,  at  what  point  do  you  Inject  these  Into  the  radar  system? 

AUTHOR'S  REPLY:  La  stimulation  peut  s'effectuer  de  deux  manleres  dlfferentes: 

-  le  radar  pousult  ef fectlvement  une  clble  r£ele;  ses  Informations  sont  alors  traltees  par  les 
equipments  du  banc  et  peuvent  etre  envoy£es  a  1 ‘autodlrecteur  du  missile. 

-  le  radar  ne  pousult  pas  de  clbles;  la  mission  est  entlferement  slmul^e.  Les  echos  s1mul£s  sont 
alors  Inject^s  au  niveau  de  la  llgne  num^rique  Interne  du  radar. 


Stimulation  can  occur  two  different  ways: 

-  The  radar  effectively  tracks  a  real  target;  Its  data  are  then  processed  by  the  test-bench 
equipment  and  can  be  sent  to  the  missile's  automatic  controller. 

-  The  radar  does  not  track  targets;  the  mission  Is  entirely  simulated.  Simulated  echos  are  then 
Injected  at  the  level  of  the  digital  data  line  Internal  to  the  radar. 


REFERENCE  NO.  OF  PAPER:  VI 1-38 

DISCUSSOR'S  NAME:  M.  Mansell,  British  Aerospace,  Kingston  Division 
AUTHOR'S  NAME:  T.  F.  O'Neill 

COMMENT:  If  you  wish  to  look  at  particular  parameters  within  a  mission  computer  computation  do  you 
make  changes  to  the  OFP  to  output  data  for  recording  and  how  do  you  cope  with  flight  clearance  of  the 
OFP  with  this  mode  in  and  then  removed  when  the  problem  has  been  solved? 

AUTHOR'S  REPLY:  Two  choices:  (1)  put  t.he  patch  in,  validate  It  and  leave  It  In.  This  Is  useful  only 
If  the  problem  ;s  a  long  term  one.  (2)  put  patch  In,  solve  the  current  problem,  take  patch  out,  then 
val  Idate. 


REFERENCE  NO.  OF  PAPER:  VI 1-38 
DISCUSSOR'S  NAME:  G.  Scottl ,  SELENIA,  Italy 
AUTHOR'S  NAME:  T.  F.  O'Neill 

COMMENT:  How  many  people  are  currently  joining  the  WSSF  team?  And  how  much  man-year  effort  have  you 
spent  on  the  F18  program? 

AUTHOR'S  REPLY:  There  are  50-S5  people  working  full  time  for  the  WSSF.  Totally,  there  are  probably 
twice  that  at  NWC. 


A-l 


APPENDIX 

LIST  OF  ATTENDEES 


ACTON,  A.  A.  Mr 
ATKINS,  R.J.  Mr 

BAHRE,  R.  Mr 

BALL  Wm.F.  Mr 

BARBER,  B.  Mr 

BARTH-NILSEN,  K.W.  Mr 
BENNIS,  H.G.M.  Mr 
BRAATHE,  R.  Mr 
BRAMMER,  K.  Dr 

BRAULT,  Y.  Mr 
BROSS,  P.A.  Mr 

CALLAWAY,  A.A.  Dr 

CLEMENT,  Mr 
CROVELLA,  C.  Mr 

DANIEL,  Mr 
DELEGUE,  Mr 
DE  WINTER,  J.  Capt. 
DIAMOND,  F.  Dr 
DOVE,  B.L.  Mr 

DUKE,  P.  Mr 

DUNCAN,  I.  Mr 

EIKELAND,  G.  Major 
EVANS,  B.  Mr 

FAEGRI,  A.  Mr 
FANTOZZI,  C.  Ing. 
FERRERI,  J.F.  Mr 
FORGUES,  M.  Mr 


Marconi  Avionics  Training  Dept.)  Ltd,  Airport  Works,  Rochester, 

Kent  MEI  2XX,  UK 

Smith  Industries,  Aerospace  &  Defence  Systems  Co.,  Winchester  Road, 
Basingstoke,  Hants.  UK 

Fraunhofer-Institut  fur  Informations,  u.  Datenverarbeitung,  Sebastian-Kneipp 
Str.  12-14,  D-7500  Karlsruhe,  FRG 

Head,  Avionics  Facilities  Div.,  Naval  Weapons  Center  (Code  311),  Dept,  of  the 
Navy,  China  Lake,  CA  93555,  USA 

ADV  Team,  British  Aerospace  P.L.C.,  Aircraft  Group-Warton  Division,  Warton 
Aerodrome,  Preston,  Lancs,  UK 

A/S  Kongsberg  Vaapenfabrik,  Boks  25,  N-3601  Kongsberg,  Norway 

Physics  Laboratory  TNO,  Oade  Waalsdorperv-eg  63,  The  Hague,  The  Netherlands. 

A/S  Kongsberg  Vaapenfabrik,  Boks  25,  N-360!  Kongsberg,  Norway 

ESG  Elektronik-System-Gesellschaft,  Postfach  800569,  D-8000  Munchen  80, 
FRG 

Sous-Directeur,  Thomson-CSF,  178  Bd  Gabriel  P6ri,  92240  Malakoff,  France 

Postfach  80  05  69,  Electronic-System-GmbH,  Vogelweideplatz  9, 

D-8000  MOnchen  80,  FRG 

Flight  Systems  Dept.  Y20  Bldg.,  Royal  Aircraft  Establishment,  Famborough, 
Hants  GU14  6TD,  UK 

Ferranti  Ltd,  Ferry  Rd.,  Silverknowles,  Edinburgh  EH4  4AD,  UK 

Caselle  Plant  Manager,  AERITALIA-Groppc  Equipaggiamenti,  Esercizio  di 
Cassette,  10072  Caselle  Tonnese,  Italy 

Thomson  CSF,  52  rue  Guynemer,  92130  Issy  les  Moulineaux,  France 
Thomson  CSF,  52  rue  Guynemer,  92130  Issy  les  Moulineaux,  France 
Belgian  Airstaff  -  VDT/B,  Rue  d’Evere  1,  1140  Brussels,  Belgium 
Chief  Scientist,  Rome  Air  Development  Ctr./CA,  Griffiss  AFB,  NY  13441,  USA 

Head,  Avionics  Systems  Branch,  Electronics  Directorate,  Mail  S.  477,  NASA 
Langley  Research  Center,  Hampton,  VA  23665,  USA 
British  Aerospace  Aircraft  Group,  Kingston/Brough  Division,  Brough,  North 
Humberside  HU  15  1EQ,  UK 

Ferranti  Ltd,  Ferry  Road,  Silver  Knowles,  Edinburgh,  Scotland,  EH5  2XS,  UK 

Air  Material  Command,  P.O.  Box  10,  2007  Kjeller,  Norway 
Marconi  Avionics  Limited,  Elstree  Way,  Borehamwood,  Herts,  UK 

A/S  Kongsberg  Vaapenfabrik,  Boks  25,  N-3601  Kongsberg,  Norway 
Industrie  Face  Standard,  Via  Della  Magione,  00040  Pomezia,  Roma,  Italy 
Avions  Marcel-Dassault,  78  Quai  Carnot,  92214  St.  Cloud,  France 
CIMSA,  10  -12  Ave  de  1’Europe,  78140  Velizy,  France 

ASD/ENAI,  Wright-Patterson  AFB,  Dayton,  Ohio  45433,  USA 

Systems  Engineering,  Rensselaer  Polytechnic  institute,  Troy,  N.Y.  12181,  USA 

Hellenic  Air  Force  Technology,  Research  Centre  (KETA),  Delta  Falirou, 

P  Faliron,  Athens,  Greece 


GANGL,  E.C.  Mr 
GERHARDT,  L.  Prof. 
GHICOPOULOS,  B.  Mr 


A-2 


GIORDAN1,  E.  Dr 
GOULET,  Mr 

GREFFET,  R.  Mr 
GRICE,  J.A.  Mr 

HALL,  L.G.  Mr 

HARDENBOL,  A.G,  Ir. 

HARTKE,  Dipl.  Ing. 
HAUGLAND,  T.  Mr 
HEGER,  D. 

HELPS,  Mr 

HOENINK,  G.  Mr 

HOFVIK,  L.  Dr 
HUNT,  G.H.  Dr 
HV1NDEN,  O.  Mr 

von  ISSENDORFF,  H.  Dr 

JACOBSEN,  M.  Mr 
JANIK,  K.  Mr 

JUANOLE,  G.  Dr 

KENNIS,  F.  Col. 
KIRSTETTER,  B.  Dr 
KISTER,  H.  Mr 

KLEIH,  W.  Dr 

KOLSTAD,  B.  Mr 
KUHLEN,  H  P.  Mr 

LAM3RAK1S,  Mrs 
LECOQ,  M.  Ms 
LE  GAC,  J.Y.  Mr 

LIE,  O.  Mr 
L1VESEY,  J.  Dr 

LOHNERT,  F.  Mr 

MACKINTOSH,  I.W.  Mr 

MACPHERSON,  R.W.  Dr 

MAHER,  S.  Lt 
MANSELL,  M.  Mr 


Systems  Engineering  Mgr.,  c/o  S.I.A.,  V!a  Canova,  25, 1  10126  Torino,  Italy 

MATRA,  BP  No.  1  -  Ave.  Lous  Bregutt,  78146  Velizy,  Villacoublay  Cedex, 
France 

SFIM,  13  Ave.  Marcel  Ramolfo-Gamier,  91301-Massy,  France 

Attn  of  TRC/Personnel  Dept.,  Easams  Ltd,  Lyon  Way,  Frimley  Road, 
Camberley,  Surrey  GUI 7  OP4 

Research  Institute  of  National  Defence,  Dept.  2,  Fack,  S-10450  Stockholm, 
Sweden 

Scientific  Advisor  to  Cincent,  HQ  AFCENT,  Post  Box  270,  6440AG,  Brunssum, 
The  Netherlands 

Institut  fflr  Luft-und  Raumfahrt,  Marchstr.  14,  Skr.  F3,  D-1000  Berlin  10,  FRG 
N.D.R.E.,  P.O.  Box  25,  Kjeller,  Norway 

Fraunhofer-Institut  fOr  Information,  u.  Datenverarbeitung,  Sebastian-Kneipp- 
Strasse  12/14,  D-7500  Karlsruhe,  FRG 

Smiths  Industries,  Aerospace  &  Defence  Systems  Co.,  Cheltenham  Div.,  Bishop’s 
Cleeve,  Cheltenham,  Glos.  GL52  4SF,  UK 

Centrum  Automatisering  Wapen  en  Commandosystemen,  Koninhlyke  Marine, 
Marine  Postkastoor,  1780  CA  Den  Helder,  The  Netherlands 

NDRE,  P.O.  Box  No.25,  N-2007  Kjeller,  Norway 

Royal  Aircraft  Establishment,  Famborough,  Hants  GUI 4  6TD,  UK 

N.D.R.E.,  P.O.  Box  25,  N-2007  Kjeller,  Norway 

Forschungsinstitut  Funk  &  Mathematik,  Konigstr.  2,  D-5307  Wachtberg- 
Werthhoven,  FRG 

AEG-T elefunken  N14/V3,  D-7900  Ulm,  Postfach  1730,  FRG 

Bundesamt  fur  Wehrtechnik  und  Beschaffung,  Luftfahrtgerat  der  Bundeswehr, 

Landshuter  Allee  162a,  8000  MQnchen  19,  FRG 

Laboratoire  d’Automatique  et  d’ Analyse  des  SystAmes  du  C.N.R.S.,  7  Ave.  du 
Colonel  Roche,  31400  Toulouse,  France 

Eelgian  Airstaff  -  VDT/B,  Rue  d'Evere  1,  1140  Bruxelles,  Belgium 
Eurocontrol,  rue  de  !a  Loi  72,  B-1040  Bruxelles,  Belgium 

VDO-Lufhtfahrtegerafe  Werk,  Am  der  Sandelmuhle  13,  6000  Frankfurt- 
Heddemheim,  FRG 

Messerschmitt  B61kow-Blohm  GmbH,  FE  41 1,  Postfach  801 160, 

8000  MQnchcn  80,  FRG 

Air  Material  Command,  P.O.  Box  10,  2007  Kjeller,  Norway 

ESG  Elektronik  System  GmbH,  Postfach  800569,  Vogelweideplatz  9, 

8000  Munchen  80,  FRG 

KETA,  Delta  Falirou,  Palaion  Faliron,  Athens,  Greece 
MATRA  SA,  37  Av  L.  Breguet,  78140  Velizy,  France 

DTEN/STEN,  Bureau  Guidage-Pilotage,  26  Boulevard  Victor,  75015  Paris, 
France 

Air  Material  Command,  P.O.  Box  10,  2007  Kjeller,  Norway 

School  of  Information  &.  Computer  Sc.,  G  ;orgia  Institute  of  Technology, 
Atlanta,  GA  30332,  USA 

TU  Berlin,  Institut  f.  Technische  Informatik,  Sekr.  HH1,  Einsteinufer  35-37, 

1 000  Berlin  10,  FRG 

Royal  Signals  and  Radar  Establishment,  St  Andrews  Road,  Great  Malvern, 

Worcs  WR14  3PS,  UK 

NDHQ,  CRAD  19NT,  101  Colonel  By  Drive,  Ottawa,  Onatario,  KIA  OK2, 
Canada 

AFW  AL/F1GLB,  Wright-Patterson  AFB,  OH  45433,  USA 

British  Aerospace  Aircraft  Group,  Kingston/Brough  Division,  Brough,  North 
Humberside  HU15  1EQ,  UK 


A-3 


MARTIN,  J.T.  Mr 

MAilUHN,  P.  Mr 
MAYES,  D.J.  Mr 
McCUEN,  J.W. 

McTIGUE,  T  V.  Mr 

MEGNA,  V.A.  Mr 

MERAUD,  M.  Mr 
MOIR,  I.  Mr 

MOSES,  K.  Mr 
MOW  AT,  A.R.  Mr 
MOXEY,  C.  Mr 

NELSON,  H.  Mr 
O’NIELL,  Mr 

PAGANO,  F.  Mr 
PARTRIDGE,  B.W.  Mr 

PENERY,  M.T.  Mr 
PUTZKI,  R.  Dr 

QUEMARD,  J.p.  Mr 

RICHARDS,  W.R.  Mr 
ROBERTS,  M.  Mr 

ROSSIGNOL,  O.  IA 

SAGE,  D.S.  Mr 
SALTZER,  J.  Prof. 

SANDBRAATEN,  H.  Major 
SCHLICHT,  E.  Mr 

SCHOLCH,  J.  Mr 

SCHWARTZ,  R.L.  Mr 

SCOTTI  DI  UCCIO,  G.A.  Dr 

SERRA,  E.  Ing. 

SHIN,  K.G.  Prof. 

SMEDSRUD,  P.B.  Mr 
SMESTAD,  T.  Mr 
SORASEN,  O.  Mr 
SPONHOLZ,  R.  Mr 


Bracknell  Division,  Ferranti  Computer  Systems  Ltd,  Western  Rd,  Bracknell, 
Berkshire  RG 12  IRA,  UK 

Postfach  1120,  Bodenseewerk  Geratetechnik  GmbH,  D-7770  Uberlingen,  FRG 

Smiths  Industries  Ltd,  Bishops  Cleeve,  Cheltenham,  Glos.,  UK 

Sr  Proj.  Engineer,  Systems  Div.,  JTIDS  Program  Office,  Hughes  Aircraft  Co., 

P.O.  Box  3310,  TC13,  A-105,  Fullerton,  CA  92634,  USA 

Dept.  312  Bldg  271B,  McDonnell  Aircraft  Co.,  P.O.  Box  516,  St  Louis, 

MO  63166,  USA 

F-8  DFBW  Program  Manager,  The  Charles  Stark  Draper  Lab.,  Inc.,  MS  #04, 

555  Technology  Square,  Cambridge,  Mass  02139,  USA 

SAGEM,  rue  de  la  Tour  Billy,  Argenteuil  95500,  France 

Smiths  Industries,  Cheltenham  Div.,  Bishop’s  Cleeve,  Cheltenham, 

Glos.  GL52  4SF,  UK 

Flight  Systems  Division,  Bendix  Corporation,  Teterboro,  NJ  07608,  USA 
Ferranti  Ltd,  Ferry  Rd,  Silverknowles.  Edinburgh,  EH4  4AD,  Scotland,  UK 
British  Aerospace  Public  Ltd  Co.,  Aircraft  Group  Warton  Division,  Warton 
Aerodrome,  Preston,  Lancs  PR4  IAX,  UK 

Naval  Weapons  Center,  Code  3115,  China  Lake,  CA  93555,  USA 

Code  3114,  F-18  Facility  Branch,  US  Naval  Weapons  Ctr.,  China  L?ke, 

CA  93555,  USA 

Oto  Melara  S.p.a.,  v.  Valdilocchi  15,  19100  La  Spezia,  Italy 

Marconi  Radar  System,  West  Hanningfield  Rd,  Great  Baddow,  Chelmsford, 

Essex  CM2  8HN,  UK 

EMI  Electronics  Ltd,  R&E  Div.,  Wells,  Somerset  BA5  1 AA,  UK 
SCS  GmbH,  Oehleckerring  40,  2000  Hamburg  62,  FRG 

Electronique  Marcel  Dassault,  55  Quai  Carnot,  92214  St.  Cloud,  France 

Smiths  Industries  Ltd,  Bishop’s  Cleeve,  Cheltenham,  Glos.,  UK 

British  Aerospace  Aircraft  Group,  Kingston/Brough  Division,  Brough,  North 
Humberside  HU  15  1EQ,  UK 

STTI/PN1,  129  rue  de  la  Convention,  75731  Paris  Cedex  15,  France 

Marconi-Avionics  Ltd,  Monks  Way,  Linford  Wood,  Milton  Keynes,  UK 

Prof..  Computer  Science,  M.I.T.,  Room  NE  43-505,  545  Technology  Square, 
Cambridge,  Mass.  02139,  USA 

Air  Material  Command,  P.O.  Box  10,  2007  Kjeller,  Norway 

ESG  Elektronik  System  GmbH,  Postfach  800569,  Vogelweideplatz  9, 

8000  Munchen  80,  FRG 

lndustrieanlagen-Betriebsgesellschaft,  lABG-Einsteinstrasse  Geb.  21, 

8012  Ottobrunn  b.  Miinchen,  FRG 

Computer  Science  Laboratory,  SRI  International,  333  Ravenswood  Ave., 

Menlo  Park,  CA  94025,  USA 

Proj.  Mgr.,  Selenia  S.p.a.  Avionics  Systems  V.  dei  Castelli  Romani  2,  Pomezia, 
Italy 

Elletronica  San  Giorgio,  Via  Hermada  6,  16154  Genova-Sestri,  Italy 

Electronic-Comptr  &  Systems  Eng  Dept.,  Rensselaer  Polytechnic  Inst., 

Troy,  NY  12181,  USA 

A/S  Kongsberg  Vaapenfabrik,  Boks  25,  N-3601  Kongsberg,  Norway 
NDRE,  Div.  for  Electronics,  P.O.  Box  25,  2007  Kjeller,  Norway 
NDRE,  Div.  for  Electronics,  P.O.  Box  25,  Kjeller,  Norway 

Bodenseewerk  Geratetechnik  Abt  FRN-EL,  Postfach  i  120,  D-7770  Uberlingen, 
FRG 

Mgr.,  Dgtl  Fit  Ctrls  Res.,  MS  86-06,  Boeing  Military  Airplane  Co.,  P.O.  Box  3707, 
Seattle,  Wa98124,  USA 


STERN,  A.D.  Mr 


A4 

STERNANG,  A.  Mr 
STOEVNE,  H.H.  Mr 
STRADA,  J.A.  Cdr 
SVOBODOVA,  L.  Mrs 

SZLACHTA,  M.  Mr 

THORSEN,  J.G.  Mr 
TIMMERS,  H.A.  Ir. 


VAGNARELLI,  F.  L/Col.  Prof. 

VAN  KEUK,  G.  Dr 
VANDECASTEELE,  B.  Mr 
VASLIN,  Mr 
VOGEL,  M.  Dr 

WARD,  A.O.  Mr 

WARR,  H.  Mr 
WEISS,  M.  Dr 
WHITEHOUSE,  H.J.  Mr 

WILCOCK,  G.W.  Mr 

WOLF,  J.K.  Prof. 

WRIGHT,  S.M.  Mr 

YOUNG,  N.  Dr 
ZEMPOLiCH,  B.A. 


A/S  Kongsberg  Vaapenfabrik,  Boks  25,  N-3601  Kongsberg,  Norway 
A/S  Kongsberg  Vaapenfabrik,  Boks  25,  N-3601  Kongsberg,  Norway 
Office  of  Naval  Research,  Box  39,  FPO  NY  09510,  USA 

B.P.  No. 105,  INRIA,  Domaine  de  Voluceau-Rocquencourt,  78153  Le  Chesnay, 
France 

LITEF,  Der  Hellige  GmbH,  Lorracher  Str.  18  Postfach  774,  7800  Frieburi/,  ERG 

A/S  Kongsberg  Vaapenfabrik,  Boks  25,  N-3601  Kongsberg,  Norway 

National  Aerospace  Laboratory  NLR,  Anthony  Fokkerweg  2, 

1559  CM  Amsterdam,  The  Nether lands 

Aeronautica  Militaire,  Ufficio  Delegato  Nazionale  alTAGARD,  P.  le  K.  V  nauer, 
3,  00144  Roma-Eur,  Italy 

Forschungsinstitut  fur  Funk  &  Math.,  FGAN,  5307-Wachtberg-Werthoven,  FRG 
B.P.  300,  78  Quai  Camot,  92214  St.  Cloud,  France 
SFIM,  13  Ave  Marcel  Ramolfo-Gamier,  9 1301 -Massy,  France 
DFVLR,  D-8031  Oberpfaffenhofen,  FRG 

Warton  Division  -  Warton  Aerodrome,  British  Aerospace  -  Aircraft  Group, 
Preston  PR4  1AX,  UK 

EMI  Ltd,  Radar  House,  Dawley  Road,  Hayes,  Middlesex,  UK 
Aerospace  Corporation,  P.O.  Box  9295  7,  Los  Angeles,  CA  90009,  USA 

Naval  Ocean  Systems  Center,  Code  5303,  Catalina  Blvd.,  San  Diego,  CA  92152, 
USA 

EP  Department,  Royal  Aircraft  Establishment,  Famborough,  Hants  GU14  6TD, 
UK 

Dept,  of  Elec.  &  Comp.  Eng.,  University  of  Mass.,  Amherst,  Mass  01003,  USA 

Systems  Design  Office,  British  Aerospace,  Aircraft  Group,  Kingston-Brough 
Division,  Brough,  North  Humberside  HU  15  1EQ,  UK 

Ultra  Electronic  Controls  Ltd,  136  Mansfield  Rd,  Western  Ave.,  London  W3,  UK 

Dep.  Tech.  Admin,  for  Command,  Ctrl  &  Guidance  Research  &  Tech.  Gp.  NASC, 
Naval  Air  Systems  Command  (AIR-360B),  Washington  D.C.  20361,  USA 


REPORT  DOCUMENTATION  PAGE 


1.  Recipient ’a  Reference 

2.  Originator’s  Reference 

3.  Further  Reference 

4.  Security  Classification 
of  Document 

AGARDCP-303 

ISBN  92-835-0302-3 

UNCLASSIFIED 

S.Oiiginator  Advisory  Group  for  Aerospace  Research  and  Development 


North  Atlantic  Treaty  Organization 
7  Rue  Ancelle,  92200  Neuilly  sur  Seine,  France 

6/fltie 

TACTICAL  AIRBORNE  DISTRIBUTED  COMPUTING  AND  NETWORKS 


7. Presented  at  a  Meeting  of  the  Avionics  Panel  held  in  R0ros,  Norway, 

22-25  June,  1981. 

8.  Author(s)/Editor(s) 

9.  Date 

Various 

October  1981 

10.  Author’s/Editor’s  Address 

11.  Pages 

Various 

434 

12.  Distribution  Statement  Thjs  document  is  distributed  in  accordance  with  AGARD 


policies  and  regulations,  which  are  outlined  on  the 
Outside  Back  Cover  of  all  AGARD  publications. 

1 3.  Key  words/Descriptors 


Computer  systems  hardware  Design  criteria 

Data  links  Reliability 

Switching  theory  Avionics 

Computer  programs 


14.  Abstract 

These  proceedings  consist  of  the  papers  and  discussions^presented  at  the  Avionics  Panel 
Meeting  on  “Tacticai  Distributed  Computing  and  Networks”  held  in  R$ros,  Norway, 
22-25  June  1981. ‘-’’The  35  papers  were  divided  as  follows,  three  on  state-of-the-art; 
five  on  system  architecture;  four  on  system  design  approaches;  five  on  software;  five 
on  fault  tolerance  and  reliability;  six  on  interconnection,  bussing  and  networking; 
seven  on  applications  to  avionics  systems. 


I  AG AJ U)  Conference  Proceedings  No. 303  I  AGARD-CP-303  AGARD  Conference  Proceedings  No. 303  AGARD-CP-303 

j  Advisory  Group  for  Aerospace  Research  and  1 _  Advisory  Group  for  Aerospace  Research  and _ 


? 


s  t  g 

t  1 1 -a 

CA  ft  t> 

M  t/i  (jA  Vh 

ill! 3 

a  o  a  5,  £  5 
E  2 |  Ef 3  o 

o  *  i  o  «  «  > 

u  a  w  u  a  oc  < 


So  - 

zSSS-g 

~«gts 

S-d?° 

E^bs 

opZ-Sj? 

s  o  ®  i§  $ 

*>^5  an 
Q  H  ■<!  o.  tt 


c  -a 
"O  o  — I 

3  s-S 
gls9 

a  «  tZ 

#S  o 
a—  * 
U  U  " 

■St-a 

5|s 


r-o  c  c  c  e 
«  g  o  o  o 

Ill  s  I 

I  si's8 

e  >  g  5 .5 

8®  If -H 

at  «  2  § 


S2,  a 

is  § 

8  -T»' 

S-g  S 

a,  fi  3 
„  e  £ 

8 »  s 

Jg  a  .a 

P  S.Q 


«fi|8 

III 
!«§?. 
«  S  g  2 
=  °  §2. 
tZ  v  — 

2  g  o  § 
i  ~  ^  S  v 

Stfl®  : 

^|§ 

g«2a| 

O  o  < 

Z  3  13  8  . 


111 

if! 11a 


E  2S  g  -ff £  .2 

u  3  £  8  I  £  < 


*o  < 
u 

8 

e& 

». _ 

i  §*3 

u; 

u  I 
a  fr  I 

«  a  o 

33a 


fia  w- 

S  ^  0> 
kSd 

*|S 

3g!» 

P~|  £ 

nQa“ 


S'Sa 

o  .a  2 

3  J*  06 

.^.s 

flSs 
3  gpJS 
i£  '3sw 

g.|5 

S.s  g 
~  §z 

w-  cu  __ 
O  ^3 

°8S 
.a  e  “ 
1 1  2 
3|‘§ 

v>  D» 
M  «  d 

5S  i 

8  "-O' 

s  -o  £ 

6,2  3 

8a£ 

a  ai  w* 


T3  C  C  C  C 
«  g  O  O  O 

I  ll -a  I 

Mil? 

e  |  g  5  ,s 

ll| 

at  % 

£  g  c  §  I 

<o  «  3>  c 

Sy  »  § 
r 't  -o  S  13 

.|g|S 
ilf  2  9 

—  c?*5-a 

S  g  a  g  3  ■ 
£  o  4-.  x> 

^  «  U  c- 

«q  J  g  §  s 


^£gs« 

^  ^  u  ,° 

a  g  |«  s 

£:§  s  g  i 

*£■2  *  a 
o  *0  *5  ** 
z  a  c3  s  .s 


•5  o,  « 

1 1  S  i  8 
•s  »•  a  -S  a 
;  «  6  -9  S  .2 


E  2  s  E  -9  a  .2 

O^iOVwS- 

UQwuflai< 


Z  00 

So  - 

S<Ot5 
g-jg  O 

lygl  | 

|  b  a  a  ^ 

IggIS 


!  §1  i 

5-S.S  - 

■Sen 

.  _  «  o 
;  2  J 

ois  12  > 
is  ^ 

X  (A  <0  •  *■ 

i  -g  ft  t 
S  om  7 
.  *  ^  » 
iiiSY 
szjS's 

!  .  £ 
j  Soo  3 
3  KfiON  VJ 

:  j]  <u  s 

J  3  s  0 
^**2  « 
i  i-i 

.O^S 

*  *rj  ^ 

B  E-S 

>  .2  S 
LQZ  S3 


r  c  c 
000 

52e 
e  *  « 

8*|“ 
js  i3  . . 
Ui  » 

§|| 
&“  o 
«  2  i 

ts  t: 

i B 

73  |c 
E  «  N  < 

4>  75 

«  S  » 
3y  *-  •§ 


E  s  ,‘i 
S  0  e 

^  s  -B  \ 

t$  JJ  §  i 

|  i  §  j 

■B  jM  u  t 

O  ^  -*2 

is  a  .s 


B  £  g 
£  Q  fe  M 

tn  U  O  .2 

&  £  B.S 

W  <*>  hfl  ^  T1  ^ 

75  H  -4-5  O  ;*3  Yl 
a  ^  o  §.  &  •§  c 
E2«  E-f  ■£  .2 

O  ”S  4  O  «  *>  S 
U  Q  w  (j  O  K  <1 


Ul  _ 
Z  00 


u  u 
C  ab 
8  g| 
£  r§  . 
c«  “ 

cj  g 

a  &  l 

ft!  B  O 
<  •£  K 

S35l 


h**  ^ 

<  o  « 


us  «  S 

uSsi 


"B  e  g 

£  g. 

■3  wC 

§  §|; 

ut 

5  ft, 

<r>  4>  a 

af  1 
P's^ 

■  l  B 

!  »  5  tt 
i>2  "  S'. 

1  |  §  §, 

IS  g 

!  ^-2  ^ 

>  w  S  « 

:  ££2 

i  O  H  ' 

1  Z  S3  S3 


£  8 

£  .. 

JO  00 

J.S 
“■8 
s  I 

8-°  g 

u  «  « 

3*% 

S'l  8 
js'S 

'“BO 

c  .’S 

o  c  « 
«>  .2  O 
C  y  “ 

s  g-2 

|  S  3 

£  «*  a 
C  £  a 

a  .s  §• 


